MITRE has unveiled the Offensive Cyber Capability Unified LLM Testing (OCCULT) framework, a groundbreaking methodology designed to evaluate risks posed by large language models (LLMs) in autonomous cyberattacks.
Announced on February 26, 2025, the initiative responds to growing concerns that AI systems could democratize offensive cyber operations (OCO), enabling malicious actors to scale attacks with unprecedented efficiency.
Cybersecurity experts have long warned that LLMs’ ability to generate code, analyze vulnerabilities, and synthesize technical knowledge could lower barriers to executing sophisticated cyberattacks.
Traditional OCOs require specialized skills, resources, and coordination, but LLMs threaten to automate these processes—potentially enabling rapid exploitation of networks, data exfiltration, and ransomware deployment.
MITRE’s research highlights that newer models like DeepSeek-R1 already demonstrate alarming proficiency, scoring over 90% on offensive cybersecurity knowledge tests.
OCCULT introduces a standardized approach to assess LLMs across three dimensions:
The framework’s rigor lies in its avoidance of simplistic benchmarks.
Instead, OCCULT emphasizes multi-step, realistic simulations where LLMs must demonstrate strategic thinking, such as pivoting through firewalls or evading detection.
MITRE’s preliminary tests against leading LLMs revealed critical insights:
Cybersecurity professionals have praised OCCULT for bridging a critical gap. “Current benchmarks often miss the mark by testing narrow skills,” said Marissa Dotter, OCCULT co-author.
“Our framework contextualizes risks by mirroring how attackers use AI.” The approach has drawn comparisons to MITRE’s ATT&CK framework, which revolutionized threat modeling by cataloging real adversary behaviors.
However, some experts caution against overestimating LLMs. Initial tests show models struggle with advanced tasks like zero-day exploitation or operationalizing novel vulnerabilities.
“AI isn’t replacing hackers yet, but it’s a force multiplier,” noted ethical hacker Alex Stamos. “OCCULT helps us pinpoint where defenses must evolve.”
MITRE plans to open-source OCCULT’s test cases, including TACTL and BloodHound evaluations, to foster collaboration.
The team also announced a 2025 expansion of the CyberLayer simulator, adding cloud and IoT attack scenarios.
Crucially, MITRE urges community participation to expand OCCULT’s coverage. “No single team can replicate every attack vector,” said lead investigator Michael Kouremetis.
“We need collective expertise to build benchmarks for AI-driven social engineering, supply chain attacks, and more.”
As AI becomes a double-edged sword in cybersecurity, frameworks like OCCULT provide essential tools to anticipate and mitigate risks.
By rigorously evaluating LLMs against real-world attack patterns, MITRE aims to arm defenders with actionable insights—ensuring AI’s transformative potential isn’t overshadowed by its perils.
Collect Threat Intelligence on the Latest Malware and Phishing Attacks with ANY.RUN TI Lookup -> Try for free
Microsoft Teams users across the globe are experiencing significant disruptions in file-sharing capabilities due to…
Cloud computing has transformed the way organizations operate, offering unprecedented scalability, flexibility, and cost savings.…
Security awareness has become a critical component of organizational defense strategies, particularly as companies adopt…
Security researchers have unveiled a new malware process injection technique dubbed "Waiting Thread Hijacking" (WTH),…
The global regulatory landscape for cybersecurity is undergoing a seismic shift, with the European Union’s…
A sophisticated new malware suite targeting macOS, dubbed "PasivRobber," has been discovered by security researchers.…