Researchers Hacked AI Assistants Using ASCII Art

Large language models (LLMs) are vulnerable to attacks, leveraging their inability to recognize prompts conveyed through ASCII art. 

ASCII art is a form of visual art created using characters from the ASCII (American Standard Code for Information Interchange) character set.

Recently, the following researchers from their respective universities proposed a new jailbreak attack, ArtPrompt, that exploits LLMs‘ poor performance in recognizing ASCII art to bypass safety measures and produce undesired behaviors:-

  • Fengqing Jiang (University of Washington)
  • Zhangchen Xu (University of Washington)
  • Luyao Niu (University of Washington)
  • Zhen Xiang (UIUC)
  • Bhaskar Ramasubramanian (Western Washington University)
  • Bo Li (University of Chicago)
  • Radha Poovendran (University of Washington)

ArtPrompt, requiring only black-box access, is shown to be effective against five state-of-the-art LLMs (GPT-3.5, GPT-4, Gemini, Claude, and Llama2), highlighting the need for better techniques to align LLMs with safety considerations beyond just relying on semantics.

Document

Free Webinar : Mitigating Vulnerability & 0-day Threats

Alert Fatigue that helps no one as security teams need to triage 100s of vulnerabilities. :

  • The problem of vulnerability fatigue today
  • Difference between CVSS-specific vulnerability vs risk-based vulnerability
  • Evaluating vulnerabilities based on the business impact/risk
  • Automation to reduce alert fatigue and enhance security posture significantly

AcuRisQ, that helps you to quantify risk accurately:

AI Assistants and ASCII Art

The use of big language models (like Llama2, ChatGPT, and Gemini) is on the rise across several applications, which raises serious security concerns. 

There has been a great deal of work in ensuring safety alignment of LLMs but that effort has been entirely focused on semantics in training/instruction corpora. 

However, this disregards alternative takes that go beyond semantics, such as ASCII art, where the arrangement of characters communicates meaning rather than their semantics, thus leaving these other interpretations unaccounted for by existing techniques that could be used to misuse LLMs.

ArtPrompt (Source – Arxiv)

The concern about the misuse and safety of further integrated large language models (LLMs) into real-world applications has been raised. 

Multiple jailbreak attacks on LLMs have been created, taking advantage of their weaknesses using methods like gradient-based input search and genetic algorithms, as well as leveraging instruction-following behaviors. 

Modern LLMs cannot recognize adequate prompts encoded in ASCII art that can represent diverse information, including rich-formatting texts.

ArtPrompt is a novel jailbreak attack that exploits LLMs’ vulnerabilities in recognizing prompts encoded as ASCII art. It has two key insights:-

  • Substituting sensitive words with ASCII art can bypass safety measures.
  • ASCII art prompts make LLMs excessively focus on recognition, overlooking safety considerations.

ArtPrompt involves word masking, where sensitive words are identified, and cloaked prompt generation, where those words are replaced with ASCII art representations. 

The cloaked prompt containing ASCII art is then sent to the victim LLM to provoke unintended behaviors.

This attack leverages LLMs’ blindspots beyond just natural language semantics to compromise their safety alignments.

Researchers found semantic interpretation during AI safety creates vulnerabilities.

They made a benchmark, the Vision-in-Text Challenge (VITC), to test language models’ ability to recognize prompts needing more than just semantics. 

Top language models struggled with this task, leading to exploitable weaknesses.

Researchers designed ArtPrompt attacks to expose these flaws, bypassing three defenses on five language models.

Experiments showed that ArtPrompt can trigger unsafe behaviors in ostensibly safe AI systems.

Stay updated on Cybersecurity news, Whitepapers, and Infographics. Follow us on LinkedIn & Twitter.

Tushar Subhra

Tushar is a Cyber security content editor with a passion for creating captivating and informative content. With years of experience under his belt in Cyber Security, he is covering Cyber Security News, technology and other news.

Recent Posts

Threat Actors Manipulate Search Results to Lure Users to Malicious Websites

Cybercriminals are increasingly exploiting search engine optimization (SEO) techniques and paid advertisements to manipulate search…

10 hours ago

Hackers Imitate Google Chrome Install Page on Google Play to Distribute Android Malware

Cybersecurity experts have unearthed an intricate cyber campaign that leverages deceptive websites posing as the…

11 hours ago

Dangling DNS Attack Allows Hackers to Take Over Organization’s Subdomain

Hackers are exploiting what's known as "Dangling DNS" records to take over corporate subdomains, posing…

11 hours ago

HelloKitty Ransomware Returns, Launching Attacks on Windows, Linux, and ESXi Environments

Security researchers and cybersecurity experts have recently uncovered new variants of the notorious HelloKitty ransomware,…

11 hours ago

RansomHub Ransomware Group Hits 84 Organizations as New Threat Actors Emerge

The RansomHub ransomware group has emerged as a significant danger, targeting a wide array of…

11 hours ago

Threat Actors Leverage Email Bombing to Evade Security Tools and Conceal Malicious Activity

Threat actors are increasingly using email bombing to bypass security protocols and facilitate further malicious…

1 day ago