Saturday, April 13, 2024

Researchers Hacked AI Assistants Using ASCII Art

Large language models (LLMs) are vulnerable to attacks, leveraging their inability to recognize prompts conveyed through ASCII art. 

ASCII art is a form of visual art created using characters from the ASCII (American Standard Code for Information Interchange) character set.

Recently, the following researchers from their respective universities proposed a new jailbreak attack, ArtPrompt, that exploits LLMs‘ poor performance in recognizing ASCII art to bypass safety measures and produce undesired behaviors:-

  • Fengqing Jiang (University of Washington)
  • Zhangchen Xu (University of Washington)
  • Luyao Niu (University of Washington)
  • Zhen Xiang (UIUC)
  • Bhaskar Ramasubramanian (Western Washington University)
  • Bo Li (University of Chicago)
  • Radha Poovendran (University of Washington)

ArtPrompt, requiring only black-box access, is shown to be effective against five state-of-the-art LLMs (GPT-3.5, GPT-4, Gemini, Claude, and Llama2), highlighting the need for better techniques to align LLMs with safety considerations beyond just relying on semantics.


Free Webinar : Mitigating Vulnerability & 0-day Threats

Alert Fatigue that helps no one as security teams need to triage 100s of vulnerabilities.:

  • The problem of vulnerability fatigue today
  • Difference between CVSS-specific vulnerability vs risk-based vulnerability
  • Evaluating vulnerabilities based on the business impact/risk
  • Automation to reduce alert fatigue and enhance security posture significantly

AcuRisQ, that helps you to quantify risk accurately:

AI Assistants and ASCII Art

The use of big language models (like Llama2, ChatGPT, and Gemini) is on the rise across several applications, which raises serious security concerns. 

There has been a great deal of work in ensuring safety alignment of LLMs but that effort has been entirely focused on semantics in training/instruction corpora. 

However, this disregards alternative takes that go beyond semantics, such as ASCII art, where the arrangement of characters communicates meaning rather than their semantics, thus leaving these other interpretations unaccounted for by existing techniques that could be used to misuse LLMs.

ArtPrompt (Source – Arxiv)

The concern about the misuse and safety of further integrated large language models (LLMs) into real-world applications has been raised. 

Multiple jailbreak attacks on LLMs have been created, taking advantage of their weaknesses using methods like gradient-based input search and genetic algorithms, as well as leveraging instruction-following behaviors. 

Modern LLMs cannot recognize adequate prompts encoded in ASCII art that can represent diverse information, including rich-formatting texts.

ArtPrompt is a novel jailbreak attack that exploits LLMs’ vulnerabilities in recognizing prompts encoded as ASCII art. It has two key insights:-

  • Substituting sensitive words with ASCII art can bypass safety measures.
  • ASCII art prompts make LLMs excessively focus on recognition, overlooking safety considerations. 

ArtPrompt involves word masking, where sensitive words are identified, and cloaked prompt generation, where those words are replaced with ASCII art representations. 

The cloaked prompt containing ASCII art is then sent to the victim LLM to provoke unintended behaviors.

This attack leverages LLMs’ blindspots beyond just natural language semantics to compromise their safety alignments.

Researchers found semantic interpretation during AI safety creates vulnerabilities.

They made a benchmark, the Vision-in-Text Challenge (VITC), to test language models’ ability to recognize prompts needing more than just semantics. 

Top language models struggled with this task, leading to exploitable weaknesses.

Researchers designed ArtPrompt attacks to expose these flaws, bypassing three defenses on five language models.

Experiments showed that ArtPrompt can trigger unsafe behaviors in ostensibly safe AI systems.

Stay updated on Cybersecurity news, Whitepapers, and Infographics. Follow us on LinkedIn & Twitter.


Latest articles

Alert! Palo Alto RCE Zero-day Vulnerability Actively Exploited in the Wild

In a recent security bulletin, Palo Alto Networks disclosed a critical vulnerability in its...

6-year-old Lighttpd Flaw Impacts Intel And Lenovo Servers

The software supply chain is filled with various challenges, such as untracked security vulnerabilities...

Hackers Employ Deepfake Technology To Impersonate as LastPass CEO

A LastPass employee recently became the target of an attempted fraud involving sophisticated audio...

Sisence Data Breach, CISA Urges To Reset Login Credentials

In response to a recent data breach at Sisense, a provider of data analytics...

DuckDuckGo Launches Privacy Pro: 3-in-1 service With VPN

DuckDuckGo has launched Privacy Pro, a new subscription service that promises to enhance user...

Cyber Attack Surge by 28%:Education Sector at High Risk

In Q1 2024, Check Point Research (CPR) witnessed a notable increase in the average...

Midnight Blizzard’s Microsoft Corporate Email Hack Threatens Federal Agencies: CISA Warns

The Cybersecurity and Infrastructure Security Agency (CISA) has issued an emergency directive concerning a...
Tushar Subhra Dutta
Tushar Subhra Dutta
Tushar is a Cyber security content editor with a passion for creating captivating and informative content. With years of experience under his belt in Cyber Security, he is covering Cyber Security News, technology and other news.

Top 3 SME Attack Vectors

Securing the Top 3 SME Attack Vectors

Cybercriminals are laying siege to small-to-medium enterprises (SMEs) across sectors. 73% of SMEs know they were breached in 2023. The real rate could be closer to 100%.

  • Stolen credentials
  • Phishing
  • Exploitation of vulnerabilities

Related Articles