Wednesday, February 26, 2025
HomeAIMITRE Releases OCCULT Framework to Address AI Security Challenges

MITRE Releases OCCULT Framework to Address AI Security Challenges

Published on

SIEM as a Service

Follow Us on Google News

MITRE has unveiled the Offensive Cyber Capability Unified LLM Testing (OCCULT) framework, a groundbreaking methodology designed to evaluate risks posed by large language models (LLMs) in autonomous cyberattacks.

Announced on February 26, 2025, the initiative responds to growing concerns that AI systems could democratize offensive cyber operations (OCO), enabling malicious actors to scale attacks with unprecedented efficiency.

Cybersecurity experts have long warned that LLMs’ ability to generate code, analyze vulnerabilities, and synthesize technical knowledge could lower barriers to executing sophisticated cyberattacks.

Traditional OCOs require specialized skills, resources, and coordination, but LLMs threaten to automate these processes—potentially enabling rapid exploitation of networks, data exfiltration, and ransomware deployment.

MITRE’s research highlights that newer models like DeepSeek-R1 already demonstrate alarming proficiency, scoring over 90% on offensive cybersecurity knowledge tests.

Inside the OCCULT Framework

OCCULT introduces a standardized approach to assess LLMs across three dimensions:

  1. OCO Capability Areas: Tests align with real-world tactics from frameworks like MITRE ATT&CK®, covering credential theft, lateral movement, and privilege escalation.
  2. Use Cases: Evaluations measure if an LLM acts as a knowledge assistant, collaborates with tools (co-orchestration), or operates autonomously.
  3. Reasoning Power: Scenarios test planning, environmental perception, and adaptability—key indicators of an AI’s ability to navigate dynamic networks.

The framework’s rigor lies in its avoidance of simplistic benchmarks.

Instead, OCCULT emphasizes multi-step, realistic simulations where LLMs must demonstrate strategic thinking, such as pivoting through firewalls or evading detection.

Conceptual view of the OCCULT LLM Evaluation
Conceptual view of the OCCULT LLM Evaluation

Key Evaluations and Findings

MITRE’s preliminary tests against leading LLMs revealed critical insights:

  • TACTL Benchmark: DeepSeek-R1 aced a 183-quency assessment of offensive tactics, achieving 91.8% accuracy, while Meta’s Llama 3.1 and GPT-4o trailed closely. The benchmark includes dynamic variables to prevent memorization, forcing models to apply conceptual knowledge.
  • BloodHound Equivalency: Models analyzed synthetic Active Directory data to identify attack paths. While Mixtral 8x22B achieved 60% accuracy in simple tasks, performance dropped in complex scenarios, exposing gaps in contextual reasoning1.
  • CyberLayer Simulations: In a simulated enterprise network, Llama 3.1 70B excelled at lateral movement using living-off-the-land techniques, completing objectives in 8 steps—far outpacing random agents (130 steps).

Cybersecurity professionals have praised OCCULT for bridging a critical gap. “Current benchmarks often miss the mark by testing narrow skills,” said Marissa Dotter, OCCULT co-author.

“Our framework contextualizes risks by mirroring how attackers use AI.” The approach has drawn comparisons to MITRE’s ATT&CK framework, which revolutionized threat modeling by cataloging real adversary behaviors.

However, some experts caution against overestimating LLMs. Initial tests show models struggle with advanced tasks like zero-day exploitation or operationalizing novel vulnerabilities.

“AI isn’t replacing hackers yet, but it’s a force multiplier,” noted ethical hacker Alex Stamos. “OCCULT helps us pinpoint where defenses must evolve.”

MITRE plans to open-source OCCULT’s test cases, including TACTL and BloodHound evaluations, to foster collaboration.

The team also announced a 2025 expansion of the CyberLayer simulator, adding cloud and IoT attack scenarios.

Crucially, MITRE urges community participation to expand OCCULT’s coverage. “No single team can replicate every attack vector,” said lead investigator Michael Kouremetis.

“We need collective expertise to build benchmarks for AI-driven social engineering, supply chain attacks, and more.”

As AI becomes a double-edged sword in cybersecurity, frameworks like OCCULT provide essential tools to anticipate and mitigate risks.

By rigorously evaluating LLMs against real-world attack patterns, MITRE aims to arm defenders with actionable insights—ensuring AI’s transformative potential isn’t overshadowed by its perils.

Collect Threat Intelligence on the Latest Malware and Phishing Attacks with ANY.RUN TI Lookup -> Try for free

Divya
Divya
Divya is a Senior Journalist at GBhackers covering Cyber Attacks, Threats, Breaches, Vulnerabilities and other happenings in the cyber world.

Latest articles

Genea IVF Clinic Cyberattack Threatens Thousands of Patient Records

A significant cybersecurity breach at Genea, one of Australia’s largest in vitro fertilization (IVF)...

GRUB2 Flaws Expose Millions of Linux Devices to Exploitation

A critical set of 20 security vulnerabilities in GRUB2, the widely used bootloader for...

Orange Communication Breached – Hackers Allegedly Claim 380,000 Email Records Exposed

Telecommunications provider Orange Communication faces a potential data breach after a threat actor using the pseudonym “Rey”...

RSync Vulnerabilities Allow Hackers to Take Full Control of Servers – PoC Released

A series of critical security vulnerabilities in the widely-used Rsync file synchronization tool have...

Supply Chain Attack Prevention

Free Webinar - Supply Chain Attack Prevention

Recent attacks like Polyfill[.]io show how compromised third-party components become backdoors for hackers. PCI DSS 4.0’s Requirement 6.4.3 mandates stricter browser script controls, while Requirement 12.8 focuses on securing third-party providers.

Join Vivekanand Gopalan (VP of Products – Indusface) and Phani Deepak Akella (VP of Marketing – Indusface) as they break down these compliance requirements and share strategies to protect your applications from supply chain attacks.

Discussion points

Meeting PCI DSS 4.0 mandates.
Blocking malicious components and unauthorized JavaScript execution.
PIdentifying attack surfaces from third-party dependencies.
Preventing man-in-the-browser attacks with proactive monitoring.

More like this

Genea IVF Clinic Cyberattack Threatens Thousands of Patient Records

A significant cybersecurity breach at Genea, one of Australia’s largest in vitro fertilization (IVF)...

GRUB2 Flaws Expose Millions of Linux Devices to Exploitation

A critical set of 20 security vulnerabilities in GRUB2, the widely used bootloader for...

Orange Communication Breached – Hackers Allegedly Claim 380,000 Email Records Exposed

Telecommunications provider Orange Communication faces a potential data breach after a threat actor using the pseudonym “Rey”...