Software

AI + ML

Google claims Big Sleep 'first' AI to spot freshly committed security bug that fuzzing missed

You snooze, you lose, er, win


Google claims one of its AI models is the first of its kind to spot a memory safety vulnerability in the wild – specifically an exploitable stack buffer underflow in SQLite – which was then fixed before the buggy code's official release.

The Chocolate Factory's LLM-based bug-hunting tool, dubbed Big Sleep, is a collaboration between Google's Project Zero and DeepMind. This software is said to be an evolution of earlier Project Naptime, announced in June. 

SQLite is an open source database engine, and the stack buffer underflow vulnerability could have allowed an attacker to cause a crash or perhaps even achieve arbitrary code execution. More specifically, the crash or code execution would happen in the SQLite executable (not the library) due to a magic value of -1 accidentally being used at one point as an array index. There is an assert() in the code to catch the use of -1 as an index, but in release builds, this debug-level check would be removed.

Thus, a miscreant could cause a crash or achieve code execution on a victim's machine by, perhaps, triggering that bad index bug with a maliciously crafted database shared with that user or through some SQL injection. Even the Googlers admit the flaw is non-trivial to exploit, so be aware that the severity of the hole is not really the news here – it's that the web giant believes its AI has scored a first.

We're told that fuzzing – feeding random and/or carefully crafted data into software to uncover exploitable bugs – didn't find the issue.

The LLM, however, did. According to Google, this is the first time an AI agent has found a previously unknown exploitable memory-safety flaw in widely used real-world software. After Big Sleep clocked the bug in early October, having been told to go through a bunch of commits to the project's source code, SQLite's developers fixed it on the same day. Thus the flaw was removed before an official release.

"We think that this work has tremendous defensive potential," the Big Sleep team crowed in a November 1 write-up. "Fuzzing has helped significantly, but we need an approach that can help defenders to find the bugs that are difficult (or impossible) to find by fuzzing, and we're hopeful that AI can narrow this gap." 

We should note that in October, Seattle-based Protect AI announced a free, open source tool that it claimed can find zero-day vulnerabilities in Python codebases with an assist from Anthropic's Claude AI model.

This tool is called Vulnhuntr and, according to its developers, it has found more than a dozen zero-day bugs in large, open source Python projects.

The two tools have different purposes, according to Google. "Our assertion in the blog post is that Big Sleep discovered the first unknown exploitable memory-safety issue in widely used real-world software," a Google spokesperson told The Register, with our emphasis added. "The Python LLM finds different types of bugs that aren't related to memory safety."

Big Sleep, which is still in the research stage, has thus far used small programs with known vulnerabilities to evaluate its bug-finding prowess. This was its first real-world experiment.

For the test, the team collected several recent commits to the SQLite repository. After manually removing trivial and document-only changes, "we then adjusted the prompt to provide the agent with both the commit message and a diff for the change, and asked the agent to review the current repository (at HEAD) for related issues that might not have been fixed," the team wrote.

The LLM, based on Gemini 1.5 Pro, ultimately found the bug, which was loosely related to changes in the seed commit [1976c3f7]. "This is not uncommon in manual variant analysis, understanding one bug in a codebase often leads a researcher to other problems," the Googlers explained.

In the write-up, the Big Sleep team also detailed the "highlights" of the steps that the agent took to evaluate the code, find the vulnerability, crash the system, and then produce a root-cause analysis.

"However, we want to reiterate that these are highly experimental results," they wrote. "The position of the Big Sleep team is that at present, it's likely that a target-specific fuzzer would be at least as effective (at finding vulnerabilities)." ®

Send us news
19 Comments

Boffins trick AI model into giving up its secrets

All it took to make an Google Edge TPU give up model hyperparameters was specific hardware, a novel attack technique … and several days

Google Gemini 2.0 Flash comes out with real-time conversation, image analysis

Chocolate Factory's latest multimodal model aims to power more trusted AI agents

Google thinks the grid can't support AI, so it's spending on solar for future datacenters

Deal with Intersect Power will see gigawatts of compute capacity come online

Open source maintainers are drowning in junk bug reports written by AI

Python security developer-in-residence decries use of bots that 'cannot understand code'

Google DeepMind touts AI model for 'better' global weather forecasting

Bases predictions on historical data, instead of solving physics equations

US bipartisan group publishes laundry list of AI policy requests

Chair Jay Obernolte urges Congress to act – whether it will is another matter

Guide for the perplexed – Google is no longer the best search engine

Seek and ye shall find

Infosec experts divided on AI's potential to assist red teams

Yes, LLMs can do the heavy lifting. But good luck getting one to give evidence

Take a closer look at Nvidia's buy of Run.ai, European Commission told

Campaign groups, non-profit orgs urge action to prevent GPU maker tightening grip on AI industry

AI's rising tide lifts all chips as AMD Instinct, cloudy silicon vie for a slice of Nvidia's pie

Analyst estimates show growing apetite for alternative infrastructure

Microsoft dangles $10K for hackers to hijack LLM email service

Outsmart an AI, win a little Christmas cash

Google Timeline location purge causes collateral damage

Privacy measure leaves some mourning lost memories