Malicious actors are always looking for ways to break into systems and take advantage of them. With AI, the latest trend is to affect the Model Context Protocol (MCP) through different strategies.
In recent years, most cyberattacks have been perpetrated in the name of money, but AI has taken things to a new level. This advanced technology can now be (mis)used to spread lies and fake news for a person or group’s gain. The latest scandal involves Grok, exAI’s bot that started acting erratically, answering questions with unsolicited information about “white genocide” in South Africa. According to The Guardian, Elon Musk’s company blames “unauthorised change for the chatbot’s rant about ‘white genocide’”. The case is still under investigation, and it isn’t clear who’s behind this anomaly. This is a good example of how AI can be tampered with and make it go wrong.
In this article, we explore how hackers are doing this and what cybersecurity specialists can do to avoid losses and mitigate further issues.
What does MCP stand for?
The Model Context Protocol works as a communication layer between the AI applications and the external world, like tools, services, and templates. Put it very simply, the MCP is what allows ChatGPT to give you the right answers to your questions, with updated information retrieved from the proper sources. However, the MCP isn’t just a bridge. It also acts as a behavior and decision-making manager.
Hackers understood that affecting the way the MCP works is the best way to affect whole AI services. Let’s see what the most common strategies employed by hackers are to mess with this technology.
Main Threats to AI LLMs & Agents
Let’s say that someone “poisons” the system, and your go-to AI LLM starts giving you all the wrong answers, or results that are different from the ones you requested. This can seriously harm the AI’s operations, deeming it unusable until the issue is solved. The person who poisoned the system can now ask for money in exchange for restoring the system to its original state.
Data poisoning attacks involve injecting malicious data into the training dataset, which can skew the model’s behavior and decision-making processes. How to prevent it? By vetting and monitoring training data sources — because that’s where AI tools “learn” how to act and what content to share — using strong training algorithms that identify anomalous data points, and by applying data sanitization tools to cleanse the platforms before training begins.
These attacks come in the form of changes to the parameters or architecture of the LLM models themselves, with the goal of deceiving the AI systems into giving out incorrect information or unintended predictions. Malicious actors do this by inputting data that has been subtly altered to deceive AI systems, often without human-detectable changes. Who benefits from it? Either hackers who wish to use this as a ransom to ask for money to restore the system’s normal operations, or it is people who wish to disrupt an industry or a community by spreading false information, for example. How to prevent it? During AI machine training, use examples of adversarial logic so it can learn how to detect it. You can also integrate ensemble models to make it harder for a single perturbation to affect all components. Adversarial attacks often affect images, so another way to prevent this is to employ techniques like image smoothing or noise filtering.
Inversion attacks pose a great risk of sensitive data exposure. The attacks are perpetrated by figuring out what’s hidden behind the training data via reconstruction. By analyzing the outputs of the machine learning models, specialized hackers can do the work backward, basically like unraveling a puzzle. Once that puzzle’s solved, they are able to access the sensitive data.
Extraction attacks happen when cybercriminals are able to clone an entire AI system by making queries and observing the responses. Experienced hackers don’t need to look into the machine to know how it works: they only have to study its behavior to replicate it. How to prevent them? In this case, privacy is the key. If specialists limit access to the model’s outputs and APIs, there are fewer gateways. Using differential privacy techniques to obscure the influence of individual data points can also help.
Prompt injection and jailbreaking are attacks unique to language model-based AI systems. These attacks manipulate how the AI interprets or generates content by exploiting its reliance on text prompts for instructions.
Jailbreaking is the deliberate crafting of prompts to bypass a bot’s built-in safety measures that filter hate speech, private data disclosure, illegal activities, or violence. These techniques “confuse” the AI bots, potentially leading to situations that foster violent behavior, like teaching someone to make a bomb, or the best way to kidnap someone.
How to prevent it? Regularly testing the AI system with new attack techniques to discover and patch vulnerabilities can go a long way. Also, filtering and cleaning user inputs to take out malicious prompts, and limiting the amount of prompt history that’s reused. Cybersecurity teams should also put several layers of security in place at different stages.
Misusing or abusing AI platforms can lead to spam attacks, deepfakes, and misinformation. There are, of course, ways to know if a certain image is, indeed, real, or just a really good copy (deepfakes), but not everyone can spot the difference. AI tools are becoming more and more efficient, and are used for both good and bad. The quality of some celebrities’ and politicians’ deepfakes might be used to cause serious political issues, to exploit people, and to spread fake news, which is also a serious problem plaguing today’s society. How to prevent it? By having companies and governments impose certain rules on the usage of public AI services, watermarking generated content, and collaborating closely with the authorities to track misuse. Meta (owner of Instagram and Facebook) requires users to disclose when they’re publishing AI-generated content. It happens mostly with videos and images, as it can be hard for other users to spot the difference.
These attacks target vulnerabilities in third-party dependencies, such as open-source libraries or pre-trained models. A malicious update to a popular machine learning library might introduce a backdoor allowing remote access to AI systems using it, compromising data and operations. How to prevent it? Audits and verification of the integrity of third-party software and datasets, as well as cryptographic signatures to validate component authenticity, are the most common tools.
AI systems often take external data as input and generate responses or actions based on that data. When these inputs and outputs aren’t carefully validated, they can become a vector for classic web vulnerabilities like SQL injection (code injection meant to destroy databases), cross-site scripting (XSS), or command injection. These issues arise when untrusted input is processed in unsafe ways, either by the AI itself or by the systems that interact with it. This can result in data breaches from unauthorized access or deletion, and compromise other systems the AI interacts with.
How to prevent it? Enforce strict validation and sanitization of inputs and outputs, use secure programming frameworks and libraries, and conduct regular code reviews and security testing.
Undetectable menaces
While AI technology offers immense potential, it also introduces new cybersecurity challenges. By understanding the threat landscape and implementing layered, proactive defense strategies, organizations can protect their AI systems against a growing array of threats. We’re witnessing a time where new threats can deceive everyone, from older to newer generations, no matter how much one understands about posting a photo on social media or formatting Excel sheets. AI LLMs and bots are news to everyone, so it’s important to stay vigilant, only access trusted platforms, and report potential threats to the authorities.
Tech-savvy people and cybersecurity specialists should stay alert and keep learning about new ways to prevent and mitigate issues, as stated above.
Integritee is the most scalable, privacy-enabling network with a Parachain on Kusama and Polkadot. Our SDK solution combines the security and trust of Polkadot, the scalability of second-layer Sidechains, and the confidentiality of Trusted Execution Environments (TEE), special-purpose hardware based on Intel Software Guard Extensions (SGX) technology, inside which computations run securely, confidentially, and verifiably.
Community & Social Media:
Join Integritee on Discord | Telegram | Twitter | Medium | Youtube | LinkedIn | Website
Products:
L2 Sidechains | Trusted Off-chain Workers | Teeracle | Attesteer | Securitee | Incognitee
Integritee Network:
Governance | Explorer | Mainnet | Github
US vs EU: Two (very) Different Approaches to Web3 Regulation
Monthly Wrap-Up May 2025: Exploring Payments with AI, TEER Trending on Kraken & More
AI to the Rescue: Streamlining Payments with AI Agents
Using AI to Fast-Track Digital Transformation in Governments: Good or Bad?
Cybersecurity: Combining AI & Proxy Tech to Fight Crime
The Digital Euro is the EU’s Next Big Thing — But at What Cost?
Cybercrime: Healthcare & Crypto Hit Hard, Scams on the Rise
K-Anonymity: An Important Tool to Achieve Collective Privacy
On Anonymity in Web3
Web3 2025 Predictions: What’s Going to Happen This Year?
Blockchain and Cybersecurity: Can Decentralization Solve the Biggest Security Challenges?
The Evolution of Smart Contracts: What’s Next?
Cross-Chain Interoperability: Major Issues & How to Tackle Them
Different Types of Crypto Wallets: All You Need to Know
Series 2 – The Integritee Network | Episode 8 – Integritee’s SDK
Series 2 – The Integritee Network | Episode 7 – The Attesteer
Series 2 – The Integritee Network | Episode 6 – The Teeracle
Series 2 – The Integritee Network | Episode 5 – Trusted Off Chain Workers
Series 2 – The Integritee Network | Episode 4 – Integritee Sidechains
Series 2 – The Integritee Network | Episode 3 – Integritee Technology
Series 2 – The Integritee Network | Episode 2 – Integritee Architecture & Components
Series 2 – The Integritee Network | Episode 1 – Introducing Integritee
Series 1 – All you need to know about TEEs | Episode 6 – TEE Limitations
Series 1 – All you need to know about TEEs | Episode 5 – TEE Principles & Threat Models
Series 1 – All you need to know about TEEs | Episode 4 – TEE Application Development
Series 1 – All you need to know about TEEs | Episode 3 – TEE Technologies
Series 1 – All you need to know about TEEs | Episode 2 – TEE Use Cases
Series 1 – All you need to know about TEEs | Episode 1 – Introduction to TEEs