Tuesday, May 13, 2025
HomeUncategorizedAI Package Hallucination - Hackers Abusing ChatGPT, Gemini to Spread Malware

AI Package Hallucination – Hackers Abusing ChatGPT, Gemini to Spread Malware

Published on

SIEM as a Service

Follow Us on Google News

The research investigates the persistence and scale of AI package hallucination, a technique where LLMs recommend non-existent malicious packages. 

The Langchain framework has allowed for the expansion of previous findings by testing a more comprehensive range of questions, programming languages (Python, Node.js, Go,.NET, and Ruby), and models (GPT-3.5-Turbo, GPT-4, Bard, and Cohere). 

The aim is to assess if hallucinations persist, generalize across models (cross-model hallucinations), and occur repeatedly (repetitiveness). 

- Advertisement - Google News
Langchain Default Prompt
Langchain Default Prompt

2500 questions were refined to 47,803 “how-to” prompts fed to the models, while repetitiveness was tested by asking 20 questions with confirmed hallucinations 100 times each.  

Results of GPT 4
Results of GPT 4

A study compared four large language models (LLMs)—GPT-4, GPT-3.5, GEMINI, and COHERE—for their susceptibility to generating hallucinations (factually incorrect outputs). 

GEMINI produced the most hallucinations (64.5%), while COHERE had the least (29.1%). Interestingly, hallucinations with potential for exploitation were rare due to factors like decentralized package repositories (GO) or reserved naming conventions (.NET). 

Results of Gemini
Results of Gemini

Lasso Security’s study also showed that GEMINI and GPT-3.5 had the most common hallucinations, which suggests that their architectures may be similar on a deeper level. This information is essential for understanding and reducing hallucinations in LLMs. 

Document
Run Free ThreatScan on Your Mailbox

AI-Powered Protection for Business Email Security

Trustifi’s Advanced threat protection prevents the widest spectrum of sophisticated attacks before they reach a user’s mailbox. Try Trustifi Free Threat Scan with Sophisticated AI-Powered Email Protection .

Multiple large language models (LLMs) have been used to study hallucinations. This is done by finding nonsensical outputs (hallucinated packages) in each model and then comparing these hallucinations to see what they have in common.

Multiple LLM analyses reveal 215 packages, with the highest overlap between Gemini and GPT-3.5 and the least between Cohere and GPT-4. 

Result of Cross-Model hallucinations
Result of Cross-Model hallucinations

This cross-model hallucination analysis offers valuable insights into the phenomenon of hallucinations in LLMs, potentially leading to a better understanding of these systems’ internal workings.  

There was a phenomenon where developers were unknowingly downloading a non-existent Python package called “huggingface-cli,”  which suggested a potential issue where large language models might be providing users with inaccurate information about available packages. 

Screenshot of ChatGPT
Screenshot of ChatGPT

To investigate further, the researchers uploaded two dummy packages: “huggingface-cli” (empty) and “blabladsa123” (also empty). 

They then monitored download rates over three months; the fake “huggingface-cli” package received over 30,000 downloads, significantly exceeding the control package “blabladsa123.”. 

fake and empty package got more than 30k authentic downloads
fake and empty package got more than 30k authentic downloads

It suggests a possible vulnerability where developers rely on incomplete or inaccurate information sources to discover Python packages. 

The adoption rate of a package was believed to be a hallucination (not an actual package), and to verify its usage, they searched GitHub repositories of major companies as the search identified references to the package in repositories of several large companies. 

installing the packages found in the README
installing the packages found in the README

For example, a repository containing Alibaba’s research included instructions on installing this package in its README file. 

These findings suggest that either the package is accurate and used by these companies or there’s a widespread phenomenon of including instructions for non-existent packages in documentation. 

Are you from SOC and DFIR Teams? – Analyse Malware Incidents & get live Access with ANY.RUN -> Start Now for Free

Balaji
Balaji
BALAJI is an Ex-Security Researcher (Threat Research Labs) at Comodo Cybersecurity. Editor-in-Chief & Co-Founder - Cyber Security News & GBHackers On Security.

Latest articles

PupkinStealer Targets Windows Users to Steal Browser Login Credentials

A newly identified information-stealing malware dubbed PupkinStealer has emerged as a significant threat to...

Repeated Firmware Key-Management Failures Undermine Intel Boot Guard and UEFI Secure Boot

The security of fundamental technologies like Intel Boot Guard and UEFI Secure Boot has...

INE Security Alert: Top 5 Takeaways from RSAC 2025

Comprehensive Training Platform Delivers Solutions for AI Security, Cloud Management, and Incident Response Readiness. Fresh...

CISA Flags Hidden Functionality Flaw in TeleMessage TM SGNL on KEV List

Cybersecurity and Infrastructure Security Agency (CISA) has escalated its advisory for TeleMessage TM SGNL,...

Resilience at Scale

Why Application Security is Non-Negotiable

The resilience of your digital infrastructure directly impacts your ability to scale. And yet, application security remains a critical weak link for most organizations.

Application Security is no longer just a defensive play—it’s the cornerstone of cyber resilience and sustainable growth. In this webinar, Karthik Krishnamoorthy (CTO of Indusface) and Phani Deepak Akella (VP of Marketing – Indusface), will share how AI-powered application security can help organizations build resilience by

Discussion points


Protecting at internet scale using AI and behavioral-based DDoS & bot mitigation.
Autonomously discovering external assets and remediating vulnerabilities within 72 hours, enabling secure, confident scaling.
Ensuring 100% application availability through platforms architected for failure resilience.
Eliminating silos with real-time correlation between attack surface and active threats for rapid, accurate mitigation

More like this

PostgreSQL Vulnerability Allows Hackers To Execute Arbitrary SQL Functions

A critical vulnerability identified as CVE-2024-7348 has been discovered in PostgreSQL, enabling attackers to...

Security Risk Advisors Announces Launch of VECTR Enterprise Edition

Security Risk Advisors (SRA) announces the launch of VECTR Enterprise Edition, a premium version...

4 Leading Methods of Increasing Business Efficiency 

The more efficient your core business operations, the more motivated and productive your employees...