Researchers Hacked AI Assistants Using ASCII Art

Large language models (LLMs) are vulnerable to attacks, leveraging their inability to recognize prompts conveyed through ASCII art. 

ASCII art is a form of visual art created using characters from the ASCII (American Standard Code for Information Interchange) character set.

Recently, the following researchers from their respective universities proposed a new jailbreak attack, ArtPrompt, that exploits LLMs‘ poor performance in recognizing ASCII art to bypass safety measures and produce undesired behaviors:-

  • Fengqing Jiang (University of Washington)
  • Zhangchen Xu (University of Washington)
  • Luyao Niu (University of Washington)
  • Zhen Xiang (UIUC)
  • Bhaskar Ramasubramanian (Western Washington University)
  • Bo Li (University of Chicago)
  • Radha Poovendran (University of Washington)

ArtPrompt, requiring only black-box access, is shown to be effective against five state-of-the-art LLMs (GPT-3.5, GPT-4, Gemini, Claude, and Llama2), highlighting the need for better techniques to align LLMs with safety considerations beyond just relying on semantics.

Document

Free Webinar : Mitigating Vulnerability & 0-day Threats

Alert Fatigue that helps no one as security teams need to triage 100s of vulnerabilities. :

  • The problem of vulnerability fatigue today
  • Difference between CVSS-specific vulnerability vs risk-based vulnerability
  • Evaluating vulnerabilities based on the business impact/risk
  • Automation to reduce alert fatigue and enhance security posture significantly

AcuRisQ, that helps you to quantify risk accurately:

AI Assistants and ASCII Art

The use of big language models (like Llama2, ChatGPT, and Gemini) is on the rise across several applications, which raises serious security concerns. 

There has been a great deal of work in ensuring safety alignment of LLMs but that effort has been entirely focused on semantics in training/instruction corpora. 

However, this disregards alternative takes that go beyond semantics, such as ASCII art, where the arrangement of characters communicates meaning rather than their semantics, thus leaving these other interpretations unaccounted for by existing techniques that could be used to misuse LLMs.

ArtPrompt (Source – Arxiv)

The concern about the misuse and safety of further integrated large language models (LLMs) into real-world applications has been raised. 

Multiple jailbreak attacks on LLMs have been created, taking advantage of their weaknesses using methods like gradient-based input search and genetic algorithms, as well as leveraging instruction-following behaviors. 

Modern LLMs cannot recognize adequate prompts encoded in ASCII art that can represent diverse information, including rich-formatting texts.

ArtPrompt is a novel jailbreak attack that exploits LLMs’ vulnerabilities in recognizing prompts encoded as ASCII art. It has two key insights:-

  • Substituting sensitive words with ASCII art can bypass safety measures.
  • ASCII art prompts make LLMs excessively focus on recognition, overlooking safety considerations.

ArtPrompt involves word masking, where sensitive words are identified, and cloaked prompt generation, where those words are replaced with ASCII art representations. 

The cloaked prompt containing ASCII art is then sent to the victim LLM to provoke unintended behaviors.

This attack leverages LLMs’ blindspots beyond just natural language semantics to compromise their safety alignments.

Researchers found semantic interpretation during AI safety creates vulnerabilities.

They made a benchmark, the Vision-in-Text Challenge (VITC), to test language models’ ability to recognize prompts needing more than just semantics. 

Top language models struggled with this task, leading to exploitable weaknesses.

Researchers designed ArtPrompt attacks to expose these flaws, bypassing three defenses on five language models.

Experiments showed that ArtPrompt can trigger unsafe behaviors in ostensibly safe AI systems.

Stay updated on Cybersecurity news, Whitepapers, and Infographics. Follow us on LinkedIn & Twitter.

Tushar Subhra

Tushar is a Cyber security content editor with a passion for creating captivating and informative content. With years of experience under his belt in Cyber Security, he is covering Cyber Security News, technology and other news.

Recent Posts

Fake BSOD Attack Launched via Malicious Python Script

A peculiar malicious Python script has surfaced, employing an unusual and amusing anti-analysis trick to…

1 day ago

SocGholish Malware Dropped from Hacked Web Pages using Weaponized ZIP Files

A recent wave of cyberattacks leveraging the SocGholish malware framework has been observed using compromised…

1 day ago

Lazarus Group Targets Developers Worldwide with New Malware Tactic

North Korea's Lazarus Group, a state-sponsored cybercriminal organization, has launched a sophisticated global campaign targeting…

1 day ago

North Korean IT Workers Penetrate Global Firms to Install System Backdoors

In a concerning escalation of cyber threats, North Korean IT operatives have infiltrated global companies,…

1 day ago

REF7707 Hackers Target Windows & Linux Systems with FINALDRAFT Malware

Elastic Security Labs has uncovered a sophisticated cyber-espionage campaign, tracked as REF7707, targeting entities across…

1 day ago

NVIDIA Container Toolkit Vulnerable to Code Execution Attacks

NVIDIA has issued a critical security update to address a high-severity vulnerability discovered in the…

1 day ago