Friday, January 31, 2025
HomeAIJailbreaking DeepSeek: Researchers Reveal Three New Methods to Override LLM Safety

Jailbreaking DeepSeek: Researchers Reveal Three New Methods to Override LLM Safety

Published on

SIEM as a Service

Follow Us on Google News

Researchers at Palo Alto Networks’ Unit 42 have revealed a troubling surge in large language model (LLM) security risks, citing three newly identified jailbreak techniques “Bad Likert Judge,” “Crescendo,” and “Deceptive Delight” capable of bypassing safety protocols in DeepSeek’s open-source LLMs.

These findings highlight the growing potential misuse of AI models by malicious actors and underscore the need for robust safeguards.

DeepSeek, a China-based AI research organization, recently introduced two competitive open-source LLMs, DeepSeek-V3 (launched in December 2024) and DeepSeek-R1 (released in January 2025).

Jailbreaking DeepSeek
Guardrail implemented in DeepSeek.

Despite their advancements in natural language processing, extensive testing revealed significant vulnerabilities in their resistance to jailbreaking attacks.

Researchers discovered that these methods can successfully override restrictions to produce harmful outputs, ranging from malicious code generation to instructions for dangerous physical activities.

High Bypass Rates Uncovered in Jailbreaking Techniques

The Bad Likert Judge exploit manipulates the LLM by embedding malicious intent within evaluation frameworks, such as Likert scales, which rank responses from benign to harmful.

Jailbreaking DeepSeek
Bad Likert Judge responses after using additional prompts.

Unit 42 researchers demonstrated this by eliciting step-by-step instructions for creating malware, including keyloggers and data exfiltration scripts, after multiple iterations of strategically crafted prompts.

The Crescendo technique, a multi-turn escalation approach, gradually coaxes the model into generating restricted content by starting with innocuous queries.

This method proved highly effective, enabling the generation of actionable guides, such as instructions for constructing Molotov cocktails or producing prohibited substances.

The adaptability of this technique poses a concerning challenge, as it can evade traditional countermeasures designed to detect single-step jailbreaks.

In contrast, the Deceptive Delight method embeds prohibited topics within benign narratives.

During testing, DeepSeek provided detailed scripts for advanced cyberattacks, such as SQL injection and Distributed Component Object Model (DCOM) exploitation, using minimal prompts.

This approach makes it particularly troubling, as it combines harmless storytelling with precise malicious output.

Escalating Risks Highlight Need for AI Safeguards

Unit 42’s findings emphasize that poorly secured LLMs, like DeepSeek’s models, lower the barrier for malicious actors to access sensitive or harmful information.

For instance, while initial responses often seemed benign, follow-up prompts exposed cracks in the safety mechanisms, revealing detailed and actionable instructions.

The broader implications are significant. Jailbreaking techniques like these allow threat actors to weaponize LLMs in multiple stages of cyberattacks, from reconnaissance and malware creation to social engineering and data theft.

For example, phishing campaigns were enhanced with highly personalized and convincing emails, crafted using DeepSeek’s responses.

Similarly, detailed methods for bypassing security protocols or launching lateral attacks were conceivable with minimal user expertise.

As LLM technology continues to evolve, this research underscores the necessity of developing more robust safeguards against adversarial manipulations.

Organizations can benefit from tools like Unit 42’s AI Security Assessment, which accelerates innovation while mitigating risks.

Furthermore, security solutions powered by Precision AI can monitor and control unauthorized usage of potentially unsafe AI applications.

The rising sophistication of jailbreaking attacks demands proactive defenses, including enhanced model training, real-time monitoring, and stricter access controls for sensitive AI systems.

Companies using third-party AI tools must also implement stringent usage policies to ensure ethical and secure deployment.

Are you from SOC/DFIR Teams? – Analyse Malware Files & Links with ANY.RUN Sandox -> Try for Free

Aman Mishra
Aman Mishra
Aman Mishra is a Security and privacy Reporter covering various data breach, cyber crime, malware, & vulnerability.

Latest articles

CRLF Injection Vulnerabilities Identified in Popular .NET Libraries RestSharp and Refit

Security researchers have uncovered critical CRLF (Carriage Return Line Feed) injection vulnerabilities in two...

New York Blood Center Targeted by Ransomware, IT Operations Impacted

New York Blood Center Enterprises (NYBC), one of the nation’s largest blood donation and...

Authorities Seized 39 Websites that Selling Hacking Tools to Launch Cyber Attacks

Authorities have seized 39 websites allegedly used to sell hacking tools and fraud-enabling software.The...

Yeti Forensic Platform Vulnerability Allows Attackers to Execute Remote Code

A critical security flaw has been identified in the popular Yeti Forensic Intelligence platform,...

API Security Webinar

Free Webinar - DevSecOps Hacks

By embedding security into your CI/CD workflows, you can shift left, streamline your DevSecOps processes, and release secure applications faster—all while saving time and resources.

In this webinar, join Phani Deepak Akella ( VP of Marketing ) and Karthik Krishnamoorthy (CTO), Indusface as they explores best practices for integrating application security into your CI/CD workflows using tools like Jenkins and Jira.

Discussion points

Automate security scans as part of the CI/CD pipeline.
Get real-time, actionable insights into vulnerabilities.
Prioritize and track fixes directly in Jira, enhancing collaboration.
Reduce risks and costs by addressing vulnerabilities pre-production.

More like this

CRLF Injection Vulnerabilities Identified in Popular .NET Libraries RestSharp and Refit

Security researchers have uncovered critical CRLF (Carriage Return Line Feed) injection vulnerabilities in two...

New York Blood Center Targeted by Ransomware, IT Operations Impacted

New York Blood Center Enterprises (NYBC), one of the nation’s largest blood donation and...

Authorities Seized 39 Websites that Selling Hacking Tools to Launch Cyber Attacks

Authorities have seized 39 websites allegedly used to sell hacking tools and fraud-enabling software.The...