Researchers Demonstrate How MCP Prompt Injection Can Be Used For Both Attack And Defense
As the field of artificial intelligence (AI) continues to evolve at a rapid pace, new research has found how techniques that render the Model Context Protocol (MCP) susceptible to prompt injection attacks could be used to develop security tooling or identify malicious tools, according to a new report from Tenable.
MCP, launched by Anthropic in November 2024, is a framework designed to connect Large Language Models (LLMs) with external data sources and services, and make use of model-controlled tools to interact with those systems to enhance the accuracy, relevance, and utility of AI applications.
It follows a client-server architecture, allowing hosts with MCP clients such as Claude Desktop or Cursor to communicate with different MCP servers, each of which exposes specific tools and capabilities.
While the open standard offers a unified interface to access various data sources and even switch between LLM providers, they also come with a new set of risks, ranging from excessive permission scope to indirect prompt injection attacks.
For example, given an MCP for Gmail to interact with Google's email service, an attacker could send malicious messages containing hidden instructions that, when parsed by the LLM, could trigger undesirable actions, such as forwarding sensitive emails to an email address under their control.
MCP has also been found to be vulnerable to what is called tool poisoning, wherein malicious instructions are embedded within tool descriptions that are visible to LLMs, and rug pull attacks, which occur when an MCP tool functions in a benign manner initially, but mutates its behavior later on via a time-delayed malicious update.
"It should be noted that while users are able to approve tool use and access, the permissions given to a tool can be reused without re-prompting the user," SentinelOne said in a recent analysis.
Finally, there also exists the risk of cross-tool contamination or cross-server tool shadowing that causes one MCP server to override or interfere with another, stealthily influencing how other tools should be used, thereby leading to new ways of data exfiltration.
The latest findings from Tenable show that the MCP framework could be used to create a tool that logs all MCP tool function calls by including a specially crafted description that instructs the LLM to insert this tool before any other tools are invoked.
In other words, the prompt injection is manipulated for a good purpose, which is to log information about "the tool it was asked to run, including the MCP server name, MCP tool name and description, and the user prompt that caused the LLM to try to run that tool."
Another use case involves embedding a description in a tool to turn it into a firewall of sorts that blocks unauthorized tools from being run.
"Tools should require explicit approval before running in most MCP host applications," security researcher Ben Smith said.
"Still, there are many ways in which tools can be used to do things that may not be strictly understood by the specification. These methods rely on LLM prompting via the description and return values of the MCP tools themselves. Since LLMs are non-deterministic, so, too, are the results."
It's Not Just MCP
The disclosure comes as Trustwave SpiderLabs revealed that the newly introduced Agent2Agent (A2A) Protocol – which enables communication and interoperability between agentic applications – could be exposed to novel form attacks where the system can be gamed to route all requests to a rogue AI agent by lying about its capabilities.
A2A was announced by Google earlier this month as a way for AI agents to work across siloed data systems and applications, regardless of the vendor or framework used. It's important to note here that while MCP connects LLMs with data, A2A connects one AI agent to another. In other words, they are both complementary protocols.
"Say we compromised the agent through another vulnerability (perhaps via the operating system), if we now utilize our compromised node (the agent) and craft an Agent Card and really exaggerate our capabilities, then the host agent should pick us every time for every task, and send us all the user's sensitive data which we are to parse," security researcher Tom Neaves said.
"The attack doesn't just stop at capturing the data, it can be active and even return false results — which will then be acted upon downstream by the LLM or user."
Source: thehackernews.com
