Saturday, April 26, 2025
HomeCyber Security NewsCloudflare Attributes Service Outage to Faulty Password Rotation

Cloudflare Attributes Service Outage to Faulty Password Rotation

Published on

SIEM as a Service

Follow Us on Google News

Cloudflare experienced a significant service outage that affected several of its key offerings, including R2 object storage, Cache Reserve, Images, Log Delivery, Stream, and Vectorize.

The incident, which lasted 1 hour and 7 minutes, was traced back to a faulty credential rotation process for the R2 Gateway service.

Incident Overview

The outage began at 21:38 UTC and ended at 22:45 UTC. During this time, all write operations to R2 failed, while about 35% of read operations were unsuccessful globally.

- Advertisement - Google News

However, there was no data loss or corruption, as any successful uploads and mutations persisted.

Cloudflare attributed the failure to human error during the credential rotation process, where new credentials were inadvertently deployed to a development instance of the R2 Gateway service instead of the production environment.

Impact on Services

The outage had wide-ranging effects across various Cloudflare services:

  • R2: All object write operations failed, and 35% of read operations were unsuccessful. Customers accessing public assets via custom domains saw a reduced error rate due to cached object reads.
  • Billing: Customers encountered issues accessing past invoices.
  • Cache Reserve: An increase in requests to origins occurred due to failed R2 reads.
  • Email Security: Customer-facing metrics were not updated.
  • Images: All uploads failed, and image delivery dropped to 25%.
  • Key Transparency Auditor: All operations failed during the incident.
  • Log Delivery: Log processing was delayed by up to 70 minutes.
  • Stream: Uploads failed, and video segment delivery was impacted, causing intermittent stalls.
  • Vectorize: Queries and operations on indexes were affected, with all insert and upsert operations failing.

The problem originated when the R2 engineering team omitted the –env parameter during the credential rotation process, inadvertently deploying new credentials to a non-production environment.

When the old credentials were removed, the production R2 Gateway service lacked access to the new credentials, causing authentication issues with the storage infrastructure.

Resolution and Preventative Measures

Cloudflare quickly resolved the incident by deploying the correct credentials to the production R2 Gateway service. To prevent similar incidents in the future, the company has implemented several changes:

  • Enhanced Logging: Added logging tags to track credential usage.
  • Process Updates: Mandated explicit confirmation of credential IDs and introduced a requirement for at least two people to validate changes.
  • Automated Deployment Tools: Shifted to using hotfix release tooling to reduce human error.
  • Improved Monitoring: Upgrading observability platforms to provide clearer insights into endpoint issues.

Cloudflare has expressed deep regret for the disruptions caused and is committed to continuous improvements in resilience and reliability across its services.

This incident highlights the importance of robust process validation and automation in critical system maintenance tasks.

Are you from SOC/DFIR Teams? – Analyse Malware, Phishing Incidents & get live Access with ANY.RUN -> Start Now for Free. 

Divya
Divya
Divya is a Senior Journalist at GBhackers covering Cyber Attacks, Threats, Breaches, Vulnerabilities and other happenings in the cyber world.

Latest articles

How to Develop a Strong Security Culture – Advice for CISOs and CSOs

Developing a strong security culture is one of the most critical responsibilities for today’s...

DragonForce and Anubis Ransomware Gangs Launch New Affiliate Programs

Secureworks Counter Threat Unit (CTU) researchers have uncovered innovative strategies deployed by the DragonForce...

“Power Parasites” Phishing Campaign Targets Energy Firms and Major Brands

Silent Push Threat Analysts have uncovered a widespread phishing and scam operation dubbed "Power...

Threat Actors Register Over 26,000 Domains Imitating Brands to Deceive Users

Researchers from Unit 42 have uncovered a massive wave of SMS phishing, or "smishing,"...

Resilience at Scale

Why Application Security is Non-Negotiable

The resilience of your digital infrastructure directly impacts your ability to scale. And yet, application security remains a critical weak link for most organizations.

Application Security is no longer just a defensive play—it’s the cornerstone of cyber resilience and sustainable growth. In this webinar, Karthik Krishnamoorthy (CTO of Indusface) and Phani Deepak Akella (VP of Marketing – Indusface), will share how AI-powered application security can help organizations build resilience by

Discussion points


Protecting at internet scale using AI and behavioral-based DDoS & bot mitigation.
Autonomously discovering external assets and remediating vulnerabilities within 72 hours, enabling secure, confident scaling.
Ensuring 100% application availability through platforms architected for failure resilience.
Eliminating silos with real-time correlation between attack surface and active threats for rapid, accurate mitigation

More like this

DragonForce and Anubis Ransomware Gangs Launch New Affiliate Programs

Secureworks Counter Threat Unit (CTU) researchers have uncovered innovative strategies deployed by the DragonForce...

“Power Parasites” Phishing Campaign Targets Energy Firms and Major Brands

Silent Push Threat Analysts have uncovered a widespread phishing and scam operation dubbed "Power...

Threat Actors Register Over 26,000 Domains Imitating Brands to Deceive Users

Researchers from Unit 42 have uncovered a massive wave of SMS phishing, or "smishing,"...