Sunday, May 18, 2025
Homecyber securityNew MCP-Based Attack Techniques and Their Application in Building Advanced Security Tools

New MCP-Based Attack Techniques and Their Application in Building Advanced Security Tools

Published on

SIEM as a Service

Follow Us on Google News

MCP, developed by Anthropic, allows Large Language Models (LLMs) to interface seamlessly with external tools, enabling the creation of agentic AI systems that can autonomously perform complex tasks.

As organizations increasingly integrate MCP, new attack techniques have emerged, highlighting the importance of robust security controls and innovative defensive strategies.

MCP Tool Manipulation and Prompt Injection

One of the most significant findings in recent research is the ability to manipulate LLM behavior through carefully crafted MCP tool descriptions. This technique draws on principles similar to prompt injection, a known weakness in LLMs where attackers can influence model behavior by embedding specific instructions.

- Advertisement - Google News

By designing tool descriptions that emphasize priority or compliance, researchers have demonstrated that it is possible to create “gatekeeper” tools.

These tools are executed before any other MCP tool, enabling logging, monitoring, or filtering of subsequent tool interactions without modifying the MCP host or client.

For example, a logging tool can be described in a way that instructs the LLM to always run it first, ensuring that every tool call is recorded for audit and compliance purposes.

This approach leverages the LLM’s interpretation of urgency and operational requirements embedded in the description, effectively prioritizing certain tools over others and establishing a foundational layer for security and governance.

Building Security Controls with MCP Tooling

Expanding on these manipulation techniques, researchers have developed prototype security tools that function as both logging mechanisms and firewalls within MCP environments.

A logging tool captures detailed information about each tool invocation, such as server names, tool names, descriptions, and the user prompts that triggered the calls. This provides organizations with a transparent audit trail of tool usage across multiple MCP servers.

Similarly, a filtering tool acts as a firewall, blocking unauthorized or unapproved tool usage based on predefined criteria.

For instance, if a tool name matches a restricted function, the filtering tool can instruct the LLM to halt execution and notify the user of policy violations.

These security tools have shown varying effectiveness across different LLM models, with some consistently respecting the established hierarchy and others requiring more explicit instructions.

These implementations demonstrate that organizations can introduce security controls and governance into MCP-enabled systems without altering the underlying protocol.

By leveraging the LLM’s ability to interpret and act on descriptive instructions, advanced security tools can be integrated directly into the MCP workflow.

System Prompt Extraction and Model Vulnerabilities

A particularly concerning aspect of MCP-based attack techniques is the potential for extracting system prompts or developer instructions from LLMs.

By designing tools that request the model’s system prompt under the guise of security analysis, researchers from Tanable have observed that some LLMs provide portions of actual prompts, while others generate hallucinated or fabricated content.

The effectiveness of this technique varies significantly between models, with some revealing sensitive information and others resisting such extraction attempts.

This vulnerability underscores the complexity of securing MCP implementations, as the non-deterministic nature of LLMs leads to unpredictable responses.

The ability to extract or infer system-level instructions poses a risk to the confidentiality and integrity of AI systems, making it essential for organizations to conduct thorough security assessments and implement safeguards against such attacks.

In conclusion, while the MCP specification requires explicit approval for tool execution, innovative techniques leveraging tool descriptions and return values can bypass intended safeguards.

These methods highlight both the opportunities for building advanced security tools and the need for ongoing vigilance as the MCP and LLM landscape continues to evolve.

Find this News Interesting! Follow us on Google News, LinkedIn, & X to Get Instant Updates!

Kaaviya
Kaaviya
Kaaviya is a Security Editor and fellow reporter with Cyber Security News. She is covering various cyber security incidents happening in the Cyber Space.

Latest articles

VMware ESXi, Firefox, Red Hat Linux & SharePoint Hacked – Pwn2Own Day 2

Security researchers demonstrated their prowess on the second day of Pwn2Own Berlin 2025, discovering...

Critical WordPress Plugin Flaw Puts Over 10,000 Sites of Cyberattack

A serious security flaw affecting the Eventin plugin, a popular event management solution for...

Sophisticated NPM Attack Leverages Google Calendar2 for Advanced Communication

A startling discovery in the npm ecosystem has revealed a highly sophisticated malware campaign...

New Ransomware Attack Targets Elon Musk Supporters Using PowerShell to Deploy Payloads

A newly identified ransomware campaign has emerged, seemingly targeting supporters of Elon Musk through...

Resilience at Scale

Why Application Security is Non-Negotiable

The resilience of your digital infrastructure directly impacts your ability to scale. And yet, application security remains a critical weak link for most organizations.

Application Security is no longer just a defensive play—it’s the cornerstone of cyber resilience and sustainable growth. In this webinar, Karthik Krishnamoorthy (CTO of Indusface) and Phani Deepak Akella (VP of Marketing – Indusface), will share how AI-powered application security can help organizations build resilience by

Discussion points


Protecting at internet scale using AI and behavioral-based DDoS & bot mitigation.
Autonomously discovering external assets and remediating vulnerabilities within 72 hours, enabling secure, confident scaling.
Ensuring 100% application availability through platforms architected for failure resilience.
Eliminating silos with real-time correlation between attack surface and active threats for rapid, accurate mitigation

More like this

VMware ESXi, Firefox, Red Hat Linux & SharePoint Hacked – Pwn2Own Day 2

Security researchers demonstrated their prowess on the second day of Pwn2Own Berlin 2025, discovering...

Critical WordPress Plugin Flaw Puts Over 10,000 Sites of Cyberattack

A serious security flaw affecting the Eventin plugin, a popular event management solution for...

Sophisticated NPM Attack Leverages Google Calendar2 for Advanced Communication

A startling discovery in the npm ecosystem has revealed a highly sophisticated malware campaign...