Jailbreaking DeepSeek: Researchers Reveal Three New Methods to Override LLM Safety

Researchers at Palo Alto Networks’ Unit 42 have revealed a troubling surge in large language model (LLM) security risks, citing three newly identified jailbreak techniques “Bad Likert Judge,” “Crescendo,” and “Deceptive Delight” capable of bypassing safety protocols in DeepSeek’s open-source LLMs.

These findings highlight the growing potential misuse of AI models by malicious actors and underscore the need for robust safeguards.

DeepSeek, a China-based AI research organization, recently introduced two competitive open-source LLMs, DeepSeek-V3 (launched in December 2024) and DeepSeek-R1 (released in January 2025).

- Advertisement -

Jailbreaking DeepSeek — Guardrail implemented in DeepSeek.

Despite their advancements in natural language processing, extensive testing revealed significant vulnerabilities in their resistance to jailbreaking attacks.

Researchers discovered that these methods can successfully override restrictions to produce harmful outputs, ranging from malicious code generation to instructions for dangerous physical activities.

High Bypass Rates Uncovered in Jailbreaking Techniques

The Bad Likert Judge exploit manipulates the LLM by embedding malicious intent within evaluation frameworks, such as Likert scales, which rank responses from benign to harmful.

Unit 42 researchers demonstrated this by eliciting step-by-step instructions for creating malware, including keyloggers and data exfiltration scripts, after multiple iterations of strategically crafted prompts.

The Crescendo technique, a multi-turn escalation approach, gradually coaxes the model into generating restricted content by starting with innocuous queries.

This method proved highly effective, enabling the generation of actionable guides, such as instructions for constructing Molotov cocktails or producing prohibited substances.

The adaptability of this technique poses a concerning challenge, as it can evade traditional countermeasures designed to detect single-step jailbreaks.

In contrast, the Deceptive Delight method embeds prohibited topics within benign narratives.

During testing, DeepSeek provided detailed scripts for advanced cyberattacks, such as SQL injection and Distributed Component Object Model (DCOM) exploitation, using minimal prompts.

This approach makes it particularly troubling, as it combines harmless storytelling with precise malicious output.

Escalating Risks Highlight Need for AI Safeguards

Unit 42’s findings emphasize that poorly secured LLMs, like DeepSeek’s models, lower the barrier for malicious actors to access sensitive or harmful information.

For instance, while initial responses often seemed benign, follow-up prompts exposed cracks in the safety mechanisms, revealing detailed and actionable instructions.

The broader implications are significant. Jailbreaking techniques like these allow threat actors to weaponize LLMs in multiple stages of cyberattacks, from reconnaissance and malware creation to social engineering and data theft.

For example, phishing campaigns were enhanced with highly personalized and convincing emails, crafted using DeepSeek’s responses.

Similarly, detailed methods for bypassing security protocols or launching lateral attacks were conceivable with minimal user expertise.

As LLM technology continues to evolve, this research underscores the necessity of developing more robust safeguards against adversarial manipulations.

Organizations can benefit from tools like Unit 42’s AI Security Assessment, which accelerates innovation while mitigating risks.

Furthermore, security solutions powered by Precision AI can monitor and control unauthorized usage of potentially unsafe AI applications.

The rising sophistication of jailbreaking attacks demands proactive defenses, including enhanced model training, real-time monitoring, and stricter access controls for sensitive AI systems.

Companies using third-party AI tools must also implement stringent usage policies to ensure ethical and secure deployment.

Are you from SOC/DFIR Teams? – Analyse Malware Files & Links with ANY.RUN Sandox -> Try for Free

Jailbreaking DeepSeek: Researchers Reveal Three New Methods to Override LLM Safety

Supply Chain Attack Prevention

Follow Us on Google News

High Bypass Rates Uncovered in Jailbreaking Techniques

Escalating Risks Highlight Need for AI Safeguards

Latest articles

Attackers Exploit Microsoft Entra Billing Roles to Escalate Privileges in Organizational Environments

Threat Actors Exploit Google Apps Script to Host Phishing Sites

Dadsec Hacker Group Uses Tycoon2FA Infrastructure to Steal Office365 Credentials

Beware: Weaponized AI Tool Installers Infect Devices with Ransomware

Resilience at Scale

Why Application Security is Non-Negotiable

Discussion points

More like this

Attackers Exploit Microsoft Entra Billing Roles to Escalate Privileges in Organizational Environments

Threat Actors Exploit Google Apps Script to Host Phishing Sites

Dadsec Hacker Group Uses Tycoon2FA Infrastructure to Steal Office365 Credentials

How To Access Dark Web Anonymously and know its Secretive and Mysterious Activities

How to Build and Run a Security Operations Center (SOC Guide) – 2023

Network Penetration Testing Checklist – 2025