Saturday, December 14, 2024
HomeArtificial IntelligenceResearchers Hacked AI Assistants Using ASCII Art

Researchers Hacked AI Assistants Using ASCII Art

Published on

SIEM as a Service

Large language models (LLMs) are vulnerable to attacks, leveraging their inability to recognize prompts conveyed through ASCII art. 

ASCII art is a form of visual art created using characters from the ASCII (American Standard Code for Information Interchange) character set.

Recently, the following researchers from their respective universities proposed a new jailbreak attack, ArtPrompt, that exploits LLMs‘ poor performance in recognizing ASCII art to bypass safety measures and produce undesired behaviors:-

- Advertisement - SIEM as a Service
  • Fengqing Jiang (University of Washington)
  • Zhangchen Xu (University of Washington)
  • Luyao Niu (University of Washington)
  • Zhen Xiang (UIUC)
  • Bhaskar Ramasubramanian (Western Washington University)
  • Bo Li (University of Chicago)
  • Radha Poovendran (University of Washington)

ArtPrompt, requiring only black-box access, is shown to be effective against five state-of-the-art LLMs (GPT-3.5, GPT-4, Gemini, Claude, and Llama2), highlighting the need for better techniques to align LLMs with safety considerations beyond just relying on semantics.

Document

Free Webinar : Mitigating Vulnerability & 0-day Threats

Alert Fatigue that helps no one as security teams need to triage 100s of vulnerabilities.:

  • The problem of vulnerability fatigue today
  • Difference between CVSS-specific vulnerability vs risk-based vulnerability
  • Evaluating vulnerabilities based on the business impact/risk
  • Automation to reduce alert fatigue and enhance security posture significantly

AcuRisQ, that helps you to quantify risk accurately:

AI Assistants and ASCII Art

The use of big language models (like Llama2, ChatGPT, and Gemini) is on the rise across several applications, which raises serious security concerns. 

There has been a great deal of work in ensuring safety alignment of LLMs but that effort has been entirely focused on semantics in training/instruction corpora. 

However, this disregards alternative takes that go beyond semantics, such as ASCII art, where the arrangement of characters communicates meaning rather than their semantics, thus leaving these other interpretations unaccounted for by existing techniques that could be used to misuse LLMs.

ArtPrompt (Source – Arxiv)

The concern about the misuse and safety of further integrated large language models (LLMs) into real-world applications has been raised. 

Multiple jailbreak attacks on LLMs have been created, taking advantage of their weaknesses using methods like gradient-based input search and genetic algorithms, as well as leveraging instruction-following behaviors. 

Modern LLMs cannot recognize adequate prompts encoded in ASCII art that can represent diverse information, including rich-formatting texts.

ArtPrompt is a novel jailbreak attack that exploits LLMs’ vulnerabilities in recognizing prompts encoded as ASCII art. It has two key insights:-

  • Substituting sensitive words with ASCII art can bypass safety measures.
  • ASCII art prompts make LLMs excessively focus on recognition, overlooking safety considerations. 

ArtPrompt involves word masking, where sensitive words are identified, and cloaked prompt generation, where those words are replaced with ASCII art representations. 

The cloaked prompt containing ASCII art is then sent to the victim LLM to provoke unintended behaviors.

This attack leverages LLMs’ blindspots beyond just natural language semantics to compromise their safety alignments.

Researchers found semantic interpretation during AI safety creates vulnerabilities.

They made a benchmark, the Vision-in-Text Challenge (VITC), to test language models’ ability to recognize prompts needing more than just semantics. 

Top language models struggled with this task, leading to exploitable weaknesses.

Researchers designed ArtPrompt attacks to expose these flaws, bypassing three defenses on five language models.

Experiments showed that ArtPrompt can trigger unsafe behaviors in ostensibly safe AI systems.

Stay updated on Cybersecurity news, Whitepapers, and Infographics. Follow us on LinkedIn & Twitter.

Tushar Subhra
Tushar Subhra
Tushar is a Cyber security content editor with a passion for creating captivating and informative content. With years of experience under his belt in Cyber Security, he is covering Cyber Security News, technology and other news.

Latest articles

“Password Era is Ending,” Microsoft to Delete 1 Billion Passwords

Microsoft has announced that it is currently blocking an astounding 7,000 password attacks every...

Over 300,000 Prometheus Servers Vulnerable to DoS Attacks Due to RepoJacking Exploit

The research identified vulnerabilities in Prometheus, including information disclosure from exposed servers, DoS risks...

Reyee OS IoT Devices Compromised: Over-The-Air Attack Bypasses Wi-Fi Logins

Researchers discovered multiple vulnerabilities in Ruijie Networks' cloud-connected devices. By exploiting these vulnerabilities, attackers...

New Android Banking Malware Attacking Indian Banks To Steal Login Credentials

Researchers have discovered a new Android banking trojan targeting Indian users, and this malware...

API Security Webinar

72 Hours to Audit-Ready API Security

APIs present a unique challenge in this landscape, as risk assessment and mitigation are often hindered by incomplete API inventories and insufficient documentation.

Join Vivek Gopalan, VP of Products at Indusface, in this insightful webinar as he unveils a practical framework for discovering, assessing, and addressing open API vulnerabilities within just 72 hours.

Discussion points

API Discovery: Techniques to identify and map your public APIs comprehensively.
Vulnerability Scanning: Best practices for API vulnerability analysis and penetration testing.
Clean Reporting: Steps to generate a clean, audit-ready vulnerability report within 72 hours.

More like this

Dell Security Update, Patch for Multiple Critical Vulnerabilities

Dell Technologies has released a security advisory addressing multiple critical vulnerabilities that could expose...

Cleo 0-day Vulnerability Exploited to Deploy Malichus Malware

Cybersecurity researchers have uncovered a sophisticated exploitation campaign involving a zero-day (0-day) vulnerability in...

GitLab Security Update, Patch for Critical Vulnerabilities

GitLab announced the release of critical security patches for its Community Edition (CE) and...