Off-Prem

Cloudflare broke its logging-as-a-service service, causing customer data loss

Software snafu took five minutes to roll back. The mess it made took hours to clean up


Cloudflare has admitted that it broke its own logging-as-a-service service with a bad software update, and that customer data was lost as a result.

The network-taming firm admitted in a Tuesday post that, for roughly 3.5 hours on November 14, its Cloudflare Logs service didn't send data it collected to customers – and about 55 percent of the logs were lost.

Cloudflare Logs gathers logs generated by the cloud services and sends them to customers who want to analyze them. Cloudflare suggests the logs may prove helpful "for debugging, identifying configuration adjustments, and creating analytics, especially when combined with logs from other sources, such as your application server."

Cloudflare customers often want logs from multiple servers and, as logfiles can be verbose and voluminous, the provider worries that consuming them all could prove overwhelming.

"Imagine the postal service ringing your doorbell once for each letter instead of once for each packet of letters," the post suggests. "With thousands or millions of letters each second, the number of separate transactions that would entail becomes prohibitive."

Cloudflare therefore uses a tool called Logpush to bundle logs into bundles of predictable size, then push them to customers with a sensible cadence.

Logs that Cloudflare provides to customers are prepared by other tools called Logfwdr and Logreceiver.

On November 14, Cloudflare made a change to Logpush, designed to support an additional dataset.

It was a buggy change – it "essentially informed Logfwdr that no customers had logs configured to be pushed."

Cloudflare staff noticed the problem and reverted the change in under five minutes.

But the incident triggered another bug in Logfwdr that meant, under circumstances like the Logpush mess, all log events for all customers would be pushed into the system – instead of just for those customers who had configured a Logpush job.

The resulting flood of info is what caused the outage, and the loss of some logfiles.

Cloudflare has admonished itself for the incident. It conceded it did most of the work to prevent this sort of thing – but didn't quite finish the job. Its post likens the situation to failing to fasten a car seatbelt – the safety systems are built in and work, but they're useless if not employed.

The networking giant will try to avoid this sort of mess in future with automated alerts that mean misconfigurations "will be impossible to miss" – brave words. It also plans extra testing to prepare itself for the impact of datacenter and/or network outages and system overloads. ®

Send us news
5 Comments

AMD secure VM tech undone by DRAM meddling

Boffins devise BadRAM attack to pilfer secrets from SEV-SNP encrypted memory

2024 according to Cloudflare: Global traffic up, Google still king, US churning out bots

Same old same old really

Just how deep is Nvidia's CUDA moat really?

Not as impenetrable as you might think, but still more than Intel or AMD would like

Humanoid robots coming soon, initially under remote control

Dodgy AI chatbots as brains – what could go wrong?

Microsoft won't let customers opt out of passkey push

Enrollment invitations will continue until security improves

Boffins trick AI model into giving up its secrets

All it took to make an Google Edge TPU give up model hyperparameters was specific hardware, a novel attack technique … and several days

Even Netflix struggles to identify and understand the cost of its AWS estate

If you have trouble keeping track of your various streaming subscriptions, you're gonna love the irony

Alpine Linux 3.21: Lean, mean, and LoongArch-ready

A cool mountain breeze blowing in after the new LTS kernel

AWS now renting monster HPE servers, even in clusters of 7,680-vCPUs and 128TB

Heir to Superdome goes cloudy for those who run large in-memory databases and apps that need them

Australia moves to drop some cryptography by 2030 – before quantum carves it up

The likes of SHA-256, RSA, ECDSA and ECDH won't be welcome in just five years

$800 'AI' robot for kids bites the dust along with its maker

Moxie maker Embodied is going under, teaching important lessons about cloud services

Red Rabbit Robotics takes human form to sell work as a service

Take this job and automate it