Monday, January 27, 2025
HomeCyber AIGoogle Outlines Common Red Team Attacks Targeting AI Systems

Google Outlines Common Red Team Attacks Targeting AI Systems

Published on

SIEM as a Service

Follow Us on Google News

There are rising concerns about the security risks associated with artificial intelligence (AI), which is becoming more and more popular and pervasive.

Google, a major participant in the creation of next-generation artificial intelligence (AI), has emphasized the need for caution while using AI.

In a recent blog post, Google announced its team of ethical hackers who are dedicated to ensuring the safety of AI. This marks the first time the company has publicly disclosed this information.

The company said that the Red Team was established approximately ten years ago. The team has already identified several risks to the rapidly developing field, mostly based on how adversaries could compromise the large language models (LLMs) that power generative AI systems like ChatGPT, Google Bard, and others.

Google researchers identified six specific attacks that can be built against real-world AI systems. They discovered that these common attack vectors exhibit a unique complexity.

In most cases, the attacks cause technology to produce unintended or even malicious impacts. The outcomes can range from harmless ones to more dangerous ones.

Types Of Red Team Attacks On AI Systems

  • Prompt attacks
  • Training data extraction
  • Backdooring the model
  • Adversarial examples
  • Data poisoning
  • Exfiltration
Types of Red Team Attacks on AI Systems \\ Source : Google

The first kind of frequent assaults that Google was able to identify is prompt attacks, which utilize “prompt engineering.” It relates to creating effective prompts that provide LLMs with the instructions required to carry out specific tasks.

According to the researchers, when this effect on the model is malicious, it can in turn deliberately influence the output of an LLM-based app in ways that are not intended.

Researchers also discovered an attack known as training-data extraction, which seeks to recreate exact training instances used by an LLM, such as the Internet’s content.

“Attackers are incentivized to target personalized models or models that were trained on data containing PII, to gather sensitive information,” researchers said.

Attackers can harvest passwords or other personally identifying information (PII) from the data in this way.

Backdooring the model, often known as a backdoor, is a third possible AI attack where an attacker may try to secretly modify the behavior of a model to give inaccurate outputs with a specified ‘trigger’ phrase or feature.

In this kind of attack, a threat actor can conceal code to carry out harmful actions either in the model or in its output.

Adversarial examples, a fourth attack type, are inputs that an attacker gives to a model to produce a “deterministic, but highly unexpected output”. The picture may, for instance, appear to the human eye to depict a dog while the model sees a cat.

“The impact of an attacker successfully generating adversarial examples can range from negligible to critical and depends entirely on the use case of the AI classifier,” researchers said.

If software developers are using AI to assist them in developing software, an attacker could also use a data-poisoning attack to manipulate the model’s training data to influence the model’s output in the attacker’s preferred direction.

This could endanger the security of the software supply chain. The researchers emphasized that the effects of this assault may be comparable to those of backdooring the model.

Finally, Exfiltration attacks, in which attackers can transfer the file representation of a model to steal critical intellectual property housed in it, are the last form of attack recognized by Google’s specialized AI red team.

They can utilize that data to create their models, which they can exploit to offer attackers special powers in custom-crafted assaults.

Recommendation

Traditional security measures like making sure the models and systems are securely locked down may greatly reduce danger.

The researchers advise businesses to use red teaming in their work processes to support product creation and research efforts.

Gurubaran
Gurubaran
Gurubaran is a co-founder of Cyber Security News and GBHackers On Security. He has 10+ years of experience as a Security Consultant, Editor, and Analyst in cybersecurity, technology, and communications.

Latest articles

GitLab Security Update – Patch for Multiple Vulnerabilities

GitLab, the widely adopted DevOps platform, has announced the immediate release of versions 17.8.1, 17.7.3,...

Critical Vulnerability in Meta Llama Framework Let Remote Attackers Execute Arbitrary Code

The Oligo Research team has disclosed a critical vulnerability in Meta’s widely used Llama-stack...

INE Security Alert: Expediting CMMC 2.0 Compliance

INE Security, a leading global provider of cybersecurity training and certifications, today announced a...

Subaru’s STARLINK Connected Car’s Vulnerability Let Attackers Gain Restricted Access

In a groundbreaking discovery on November 20, 2024, cybersecurity researchers Shubham Shah and a...

API Security Webinar

Free Webinar - DevSecOps Hacks

By embedding security into your CI/CD workflows, you can shift left, streamline your DevSecOps processes, and release secure applications faster—all while saving time and resources.

In this webinar, join Phani Deepak Akella ( VP of Marketing ) and Karthik Krishnamoorthy (CTO), Indusface as they explores best practices for integrating application security into your CI/CD workflows using tools like Jenkins and Jira.

Discussion points

Automate security scans as part of the CI/CD pipeline.
Get real-time, actionable insights into vulnerabilities.
Prioritize and track fixes directly in Jira, enhancing collaboration.
Reduce risks and costs by addressing vulnerabilities pre-production.

More like this

GitLab Security Update – Patch for Multiple Vulnerabilities

GitLab, the widely adopted DevOps platform, has announced the immediate release of versions 17.8.1, 17.7.3,...

Critical Vulnerability in Meta Llama Framework Let Remote Attackers Execute Arbitrary Code

The Oligo Research team has disclosed a critical vulnerability in Meta’s widely used Llama-stack...

INE Security Alert: Expediting CMMC 2.0 Compliance

INE Security, a leading global provider of cybersecurity training and certifications, today announced a...