MCP, developed by Anthropic, allows Large Language Models (LLMs) to interface seamlessly with external tools, enabling the creation of agentic AI systems that can autonomously perform complex tasks.
As organizations increasingly integrate MCP, new attack techniques have emerged, highlighting the importance of robust security controls and innovative defensive strategies.
MCP Tool Manipulation and Prompt Injection
One of the most significant findings in recent research is the ability to manipulate LLM behavior through carefully crafted MCP tool descriptions. This technique draws on principles similar to prompt injection, a known weakness in LLMs where attackers can influence model behavior by embedding specific instructions.
By designing tool descriptions that emphasize priority or compliance, researchers have demonstrated that it is possible to create “gatekeeper” tools.
These tools are executed before any other MCP tool, enabling logging, monitoring, or filtering of subsequent tool interactions without modifying the MCP host or client.
For example, a logging tool can be described in a way that instructs the LLM to always run it first, ensuring that every tool call is recorded for audit and compliance purposes.
This approach leverages the LLM’s interpretation of urgency and operational requirements embedded in the description, effectively prioritizing certain tools over others and establishing a foundational layer for security and governance.

Building Security Controls with MCP Tooling
Expanding on these manipulation techniques, researchers have developed prototype security tools that function as both logging mechanisms and firewalls within MCP environments.
A logging tool captures detailed information about each tool invocation, such as server names, tool names, descriptions, and the user prompts that triggered the calls. This provides organizations with a transparent audit trail of tool usage across multiple MCP servers.
Similarly, a filtering tool acts as a firewall, blocking unauthorized or unapproved tool usage based on predefined criteria.
For instance, if a tool name matches a restricted function, the filtering tool can instruct the LLM to halt execution and notify the user of policy violations.
These security tools have shown varying effectiveness across different LLM models, with some consistently respecting the established hierarchy and others requiring more explicit instructions.
These implementations demonstrate that organizations can introduce security controls and governance into MCP-enabled systems without altering the underlying protocol.
By leveraging the LLM’s ability to interpret and act on descriptive instructions, advanced security tools can be integrated directly into the MCP workflow.
System Prompt Extraction and Model Vulnerabilities
A particularly concerning aspect of MCP-based attack techniques is the potential for extracting system prompts or developer instructions from LLMs.
By designing tools that request the model’s system prompt under the guise of security analysis, researchers from Tanable have observed that some LLMs provide portions of actual prompts, while others generate hallucinated or fabricated content.
The effectiveness of this technique varies significantly between models, with some revealing sensitive information and others resisting such extraction attempts.
This vulnerability underscores the complexity of securing MCP implementations, as the non-deterministic nature of LLMs leads to unpredictable responses.
The ability to extract or infer system-level instructions poses a risk to the confidentiality and integrity of AI systems, making it essential for organizations to conduct thorough security assessments and implement safeguards against such attacks.
In conclusion, while the MCP specification requires explicit approval for tool execution, innovative techniques leveraging tool descriptions and return values can bypass intended safeguards.
These methods highlight both the opportunities for building advanced security tools and the need for ongoing vigilance as the MCP and LLM landscape continues to evolve.
Find this News Interesting! Follow us on Google News, LinkedIn, & X to Get Instant Updates!