The release of DeepSeek-R1, a 671-billion-parameter large language model (LLM), has sparked significant interest due to its innovative use of Chain-of-Thought (CoT) reasoning.
CoT reasoning enables the model to break down complex problems into intermediate steps, enhancing performance on tasks such as mathematical problem-solving. However, this transparency comes with unintended vulnerabilities.
By explicitly sharing its reasoning process within “ tags, DeepSeek-R1 inadvertently exposes itself to prompt-based attacks, which malicious actors can exploit to achieve harmful objectives such as phishing and malware generation.

Research conducted using tools like NVIDIA’s Garak has demonstrated that CoT reasoning can be weaponized by attackers.
These prompt attacks involve crafting inputs to manipulate the model into revealing sensitive information or bypassing security protocols.
For instance, attackers have successfully extracted system prompts predefined instructions that guide the model’s behavior by exploiting CoT transparency.
This has parallels with phishing tactics, where attackers manipulate users into revealing sensitive data.
Exploiting Vulnerabilities in DeepSeek-R1
Prompt attacks against DeepSeek-R1 have revealed critical vulnerabilities, particularly in areas like insecure output generation and sensitive data theft.
During testing, researchers found that secrets embedded in system prompts, such as API keys, could be inadvertently exposed through the model’s CoT responses.

Even when the model was instructed not to disclose sensitive information, its reasoning process included these details, making them accessible to attackers.
Attackers have also leveraged techniques like payload splitting and indirect prompt injection to bypass guardrails designed to prevent impersonation or toxic outputs.
For example, by analyzing the “ tags in the model’s responses, attackers identified loopholes in its reasoning and crafted inputs to exploit these weaknesses.
Such methods are reminiscent of strategies used against other AI systems, such as Google’s Gemini integration, where indirect injections led to the generation of phishing links.
Red Teaming
To evaluate DeepSeek-R1’s resilience against adversarial attacks, researchers employed red-teaming strategies using tools like NVIDIA’s Garak.
According to the Report, this approach involved simulating various attack scenarios and measuring success rates across different objectives.
The findings highlighted that insecure output generation and sensitive data theft had higher success rates compared to other attack types like jailbreaks or toxicity generation.
Researchers attribute this disparity to the presence of “ tags in the model’s responses, which provide attackers with valuable insights into its decision-making process.
To mitigate these risks, experts recommend filtering out “ tags from chatbot applications using DeepSeek-R1 or similar models.
This would limit the exposure of CoT reasoning and reduce the attack surface available to threat actors.
Additionally, ongoing red-teaming efforts are essential for identifying and addressing vulnerabilities as they emerge.
By continuously testing LLMs with adversarial techniques, organizations can stay ahead of evolving threats and refine their defenses accordingly.
The vulnerabilities uncovered in DeepSeek-R1 underscore the broader challenges associated with deploying advanced LLMs in real-world applications.
As agent-based AI systems become more prevalent, the sophistication of prompt attacks is expected to grow, posing significant risks to organizations relying on these technologies.
The research highlights the importance of balancing transparency and security in AI design while features like CoT reasoning enhance performance, they must be implemented with robust safeguards to prevent misuse.
This case study serves as a cautionary tale for developers and organizations utilizing LLMs: without proactive measures such as red teaming and secure output filtering, these powerful tools can inadvertently become enablers of cyberattacks.
As the threat landscape evolves, collaborative efforts between researchers and industry stakeholders will be crucial in ensuring that AI systems remain both innovative and secure.
Find this News Interesting! Follow us on Google News, LinkedIn, & X to Get Instant Updates!