Published by Oliver Chang, Dongge Liu and Jonathan Metzman, Google Open Source Security Team

Recently, OSS-Fuzz disclosed 26 fresh vulnerabilities to maintainers of open source projects, including a flaw in the vital OpenSSL library (CVE-2024-9143) that serves as a foundation for a large portion of the internet infrastructure. These reports are not uncommon—we’ve identified and assisted in rectifying more than 11,000 vulnerabilities in the project’s 8-year history.

However, these specific vulnerabilities mark a significant achievement in automated vulnerability discovery: each was detected using Artificial Intelligence, employing AI-generated and improved fuzz targets. The OpenSSL CVE represents one of the initial weaknesses in crucial software identified by LLMs, providing another concrete instance following Google’s recent revelation of a security loophole due to insufficient stack buffer in the widely utilized database engine SQLite.

This blog article delves into the outcomes and insights gained over eighteen months of work to incorporate AI-driven fuzzing up to this juncture, including both integrating AI into the fuzz target generation and broadening this to mirror a developer’s workflow. These endeavors continue our investigations into how AI can revolutionize the detection of vulnerabilities and fortify the defenses for all.

The current progress

In August 2023, the OSS-Fuzz team revealed AI-Driven Fuzz Testing, outlining our mission to use large language models (LLM) to enhance fuzz testing coverage for discovering more vulnerabilities.automatically—prior to potential exploitation by malicious attackers. Our strategy involved leveraging the programming capabilities of an LLM to produce additional fuzz targets, akin to unit tests that assess pertinent functionality in order to hunt for vulnerabilities.

The optimal resolution would involve completely automating the manual procedure of creating an end-to-end fuzz target:

Creating an initial fuzz target.
Rectifying any compilation problems that emerge.
Executing the fuzz target to evaluate its performance, and resolving any evident errors causing runtime glitches.
Running the enhanced fuzz target for an extended period, and analyzing any crashes to identify the underlying issue.
Addressing vulnerabilities.

In August 2023, we highlighted our initiatives utilizing an LLM to manage the initial two phases. We successfully harnessed an Repetitive process for creating a fuzz target with a straightforward prompt that includes fixed examples and compilation errors.

By January 2024, we publicly released the platform that we had been constructing to empower an LLM in producing fuzz targets. At that juncture, LLMs were consistently creating targets that covered more intriguing code across 160 projects. However, there remained a significant number of projects where we were unsuccessful in obtaining even a single functional AI-generated fuzz target.

To tackle this, we have enhanced the initial two stages and have also integrated steps 3 and 4.

Recent outcomes: Enhanced code coverage and identified vulnerabilities

We can now automatically achieve broader coverage in 272 C/C++ projects on OSS-Fuzz (from 160), introducing over 370k lines of new code coverage. The most significant coverage enhancement in a single project saw an escalation from 77 lines to 5434 lines (a 7000% growth).

This led to the identification of 26 new security loopholes within initiatives on OSS-Fuzz that previously underwent hundreds of thousands of hours of fuzzing. The standout feature is CVE-2024-9143 in the crucial and extensively tested OpenSSL library. We notified about this loophole on September 16 and a solution was released on October 16. Our observation suggests that this vulnerability has probably existed for two decades and would not have been detectable using current human-crafted fuzz targets.

Another instance was a glitch in the cJSON project, where although an existing human-made harness was available to fuzz a particular operation, we still detected a new vulnerability in that same operation with an AI-generated target.

One explanation for why such glitches could remain undetected for extended periods is that statement coverage does not ensure that a function is devoid of bugs. Code coverage as a gauge cannot examine all feasible code pathways and states—varied flags and settings may trigger varied responses, exposing different glitches. These cases emphasize the necessity to continuously produce new types of fuzz targets even for code that is already fuzzed, as has also been exhibited by Project Zero in the past (1, 2).

Fresh enhancements

To accomplish these outcomes, we have been concentrating on two key enhancements:

Automatically create more pertinent context in our prompts. Providing a more extensive and applicable enlightenment to the LLM regarding a project, decreases the chances of it inventing the absent specifics in its reply. This involved supplying more precise, project-specific context in prompts, like purpose, type explanations, cross references, and current unit tests for each project. In order to automatically generate this information, we constructed new infrastructure to categorize projects across OSS-Fuzz.

The LLMs turned out to be incredibly efficient at imitating a typical developer’s complete process of scripting, testing, and refining the fuzz target, as well as assessing the detected crashes. Due to this, automating additional portions of the fuzzing process became viable. This additional cyclical feedback consequently led to enhanced quality and a higher number of accurate fuzz targets.

The progression in motion

Our LLM is now capable of executing the initial four steps of a developer’s procedure (with the fifth on the horizon).

1. Outlining an inaugural fuzz target

A developer could inspect the original code, current documentation and unit tests, as well as applications of the target function when formulating an initial fuzz target. An LLM can fulfill this function in this scenario, provided we give a prompt with this information and request it to devise a fuzz target.

Prompt:

You are tasked with composing a fuzzing harness for the given function-under-test signature utilizing <code>LLVMFuzzerTestOneInput</code>. Ensuring that the given solution compiles and actually invokes the function-under-test specified by the function signature is crucial:

</signature of the function>

The following code represents the function under test:

<code>

unsigned char* buffer_append_base64_decode(buffer *out, const char* in, size_t in_length, base64_charset charset) !in_length) ? result : NULL;

</code>

The accompanying source code for functions that make use of the function under test is displayed below:

<code>

…

2. Addressing any compilation problems that emerge.

Upon selecting a potential target, a software developer will proceed with compilingplease review it for any compilation problems that emerge. Once again, an LLM can be alerted with specific details on the compilation errors for resolution.

Alert:

The fix is presented below:

The code exhibits the following construction problems:

<error>

server_fuzzer.cc:12:46: issue detected: only virtual member functions are allowed to be tagged as ‘override’

12 | bool OnData(const char* data, size_t size) override {

| ^~~~~~~~

server_fuzzer.cc:22:39: problem: missing ‘MultiPartFile’ type within ‘httplib’ namespace

22 | const httplib::MultiPartFile& file) override {

| ~~~~~~~~~^

server_fuzzer.cc:12:27: notification: ‘data’ parameter is not used [-Wunused-parameter]

12 | bool OnData(const char* data, size_t size) override {

| ^

server_fuzzer.cc:37:5: issue: ‘operator()’ is not declared

37 | operator()(request_data.data(), receiver);

server_fuzzer.cc:27:23: caution: concealed field ‘provider_’ is not used [-Wunused-private-field]

27 | FuzzedDataProvider& provider_;

| ^

</error>

Instance of compilation errors that an LLM was capable of rectifying

3. Executing the fuzz target to observe its performance, and rectifying any evident errors causing runtime problems.

After addressing all compilation errors, a developer would test the fuzz target for a brief duration to check for any errors that might cause an instant crash, indicating an issue with the target rather than a bug found in the project.

The below is an instance of an LLM rectifying a semantic problem with the fuzzing setup:

LLM Response:

The problem is that `sqlite3_blob_open` necessitates the table to be present before accessing a blob on it. The solution ensures the table’s existence by executing a `CREATE TABLE IF NOT EXISTS` before invoking `sqlite3_blob_open`.

Here’s a breakdown of the modifications:

1. **Formulate Table Query:** A string `createTableQuery` is created to store the SQL statement for creating the table. It employs the fuzzed values for the table and column names.

2. **Prepare Statement:** `sqlite3_prepare_v2` is utilized to prepare the SQL statement from `createTableQuery`. This compiles the SQL into a byte-code program that SQLite can execute.

3. **Carry Out Statement:** `sqlite3_step` executes the prepared statement. As it’s a `CREATE TABLE` statement, it doesn’t yield any data, but it constructs the table if it’s not present.

4. **Complete Statement:** `sqlite3_finalize` eliminates the prepared statement object when it’s not required anymore.

By including these procedures prior to invoking `sqlite3_blob_open`, we guarantee the existence of the table, averting the issue of accessing a null pointer.

</explanation>

4. Executing the rectified fuzz target for an extended period of time, and analyzing any crashes.

At this juncture, the fuzz target is prepared to be executed for an elongated duration on a suitable fuzzing infrastructure, like ClusterFuzz.

Any detected crashes would then require assessment, to ascertain the underlying causes and whether they indicate valid vulnerabilities (or issues in the fuzz target). An LLM can be activated with the pertinent context (stacktraces, fuzz target source code, relevant project source code) for conducting this evaluation.

Prompt: Given the provided crash report, fuzz driver code and relevant project function code, analyze the reason behind the crash.

To start, simply state “The crash is due to a fault in the fuzz driver” or “The crash is due to a fault in the project”. Next, deliver a concise and direct analysis.

The subsequent text showcases the crash report:

<log>

STUDY of magnitude 1 at 0x502000000630 thread T0

FEAR FACTOR: 40 (1-byte-read-heap-use-after-free)

#0 0x560036939241 in __asan_memcpy /src/llvm-project/compiler-rt/lib/asan/asan_interceptors_memintrinsics.cpp:63:3

#1 0x56003697d30b in EmitLiteral<false> /src/snappy/snappy.cc:661:5

#2 0x56003697d30b in snappy::internal::CompressFragment(char const*, unsigned long, char*, unsigned short*, int) /src/snappy/snappy.cc:954:10

In this situation, the LLM correctly identifies this as a defect in the fuzz target, rather than an issue in the project under assessment.

LLM Outcome: Crash is induced by flaw in fuzz driver.

The fuzz driver constructs an array of `iovec` structures, each indicating a fragment of memory possessed by the `FuzzedDataProvider`. However, the `data_chunk` array, which contains the genuine information indicated by `iov_base`, is released when the fuzzer iteration concludes and scope is exited. As a result, the `iov_base` references become invalid. When `snappy::CompressFromIOVec` attempts to retrieve the data described by these invalid references, a heap-use-after-free issue occurs.

5. Repairing weaknesses.

The objective is to fully automate the complete process by allowing the LLM to propose a recommended fix for the vulnerability. Presently, there are no details to disclose, but we are cooperating with a variety of researchers to realize this vision and are eager to publish outcomes shortly.

Coming up

Enhancing automated triage: to achieve a point where confidence is high in not necessitating human evaluation. This automation will facilitate the automatic notification of new vulnerabilities to project maintainers. It is probable that more than the 26 vulnerabilities already forwarded upstream are lurking in our findings.

Framework based on agents: which involves empowering the LLM to independently strategize the steps required to address a specific issue by granting it access to resources for acquiring additional information and verifying outcomes. Through offering the LLM interactive access to actual resources like debuggers, it has been observed that the LLM is more inclined towards generating accurate results.

Incorporating our investigation into OSS-Fuzz as a characteristic: to realize a more completely automated end-to-end resolution for identifying vulnerabilities and implementing fixes. It is anticipated that OSS-Fuzz will serve as a valuable resource for other researchers to analyze AI-driven methods for detecting vulnerabilities and ultimately evolve as a tool that will empower defenders to uncover more vulnerabilities before they are exploited.

For more details, explore our open source platform at oss-fuzz-gen. Our aim is to further collaborate in this domain with fellow researchers. Also, do not forget to visit the OSS-Fuzz blogfor additional technical updates.

About Author

AndyC

Andy Curtis is an award-winning security consultant, researcher and public speaker. He has been working in the computer security industry since the early 1990s, having been employed by state and federal government, leading healthcare and banking providers across three continents. He has given talks about computer security for some of the world’s largest companies, worked with law enforcement agencies on investigations into hacking groups, and is a regular voice on TV and radio explaining IT security threats.

See author's posts