MCP Tool Poisoning Turns Tool Metadata Into Data Exfiltration
MCP tool metadata is now prompt content. If it is untrusted, it can override intent, steer actions, and turn connected tools into data-exfiltration paths.
Article focus
Treatment: photo
Image source: Markus Spiske via Wikimedia Commons
License: CC0
Executive summary
MCP expands the AI attack surface. If tool descriptions or results are untrusted, they can inject instructions that override policies, triggering unintended actions or data exfiltration. Multiple researchers have shown this class of weakness is practical today.
What the Source Material Shows About MCP Tool Poisoning
MCP standardizes how agents discover tools and ingest tool metadata, but that same convenience creates a new prompt-injection path. Researchers and practitioner writeups have shown that malicious or compromised MCP servers can embed instructions inside tool descriptions or returned content. Once those strings land in the model context, the model may treat them as instructions instead of data.
The Supabase MCP case study is useful because it makes the trust boundary concrete. An apparently routine support workflow became an extraction path once untrusted tool context was allowed to steer how the agent searched for and handled sensitive data.
How Malicious Tool Descriptions Become Database Exfiltration
This is not just a protocol problem. It is an enterprise integration problem. The moment a tool server can influence an agent's next action, the organization has effectively allowed external metadata to compete with internal policy. The vendor cannot secure that context for you because it does not control which MCP servers you trust, what authority those tools have, or what data they can reach.
That is why MCP tool poisoning belongs in the same category as prompt injection and data loss. A tool description that looks harmless to a human can become an instruction stream to the model, and the enterprise may not notice until the tool has already touched sensitive systems.
Why the Control and Risk Model Breaks at Runtime
MCP collapses discovery, metadata, and invocation into one conversational context. That is useful for developer ergonomics but dangerous for trust separation. If the agent cannot distinguish trusted policy from untrusted tool-provided text, then metadata itself becomes adversarial input.
The problem scales because tools usually have real authority. They can search internal systems, read records, and send output somewhere else. That means prompt injection through MCP is not just a hallucination risk. It is an execution-path risk with data-loss consequences.
Why This Scales
- Tools have broad permissions across internal systems.
- Tool metadata is treated as trusted instructions.
- Indirect prompt injection bypasses content filters.
Where Organizations Fail to Separate Trust from Tool Output
Teams usually fail by reviewing the server and forgetting the runtime behavior. They treat MCP registration as a one-time trust decision instead of an ongoing untrusted-input problem. Tool output, descriptions, and remote content are then allowed into the same context as policy instructions, with little visibility into when the model is being steered.
The second failure is over-privileged integration. Even if the model only misreads metadata occasionally, the blast radius is large when the attached tool can reach customer data, internal records, or outbound channels.
Why 3LS Belongs in the MCP Runtime Control Layer
In this article's failure mode, 3LS treats MCP metadata and tool output as adversarial input rather than trusted orchestration text. That means inspecting the context before tool invocation, applying policy to what the agent is trying to do next, and blocking or routing actions when untrusted metadata is attempting to steer a sensitive operation.
The important point is that 3LS is not just watching the final output. It sits at the tool-governance layer where metadata, privileges, and action requests meet. That is the point where MCP poisoning turns into data loss unless something independent intervenes.
Operational Next Steps for Teams Exposing MCP Tools
Inventory which MCP servers are connected to sensitive workflows, which ones can touch regulated data, and which ones can trigger outbound actions. Then treat tool metadata and returned content as untrusted by default. If a tool can influence an agent and reach sensitive systems, it needs runtime policy, not just initial approval.
Continue reading