Enhancing Fuzz Testing: Detecting additional weaknesses using Artificial Intelligence
Recently, OSS-Fuzz disclosed 26 fresh vulnerabilities to maintainers of open source projects, including a flaw in the vital OpenSSL library (CVE-2024-9143) that serves as a foundation for a large portion of the internet infrastructure. These reports are not uncommon—we’ve identified and assisted in rectifying more than 11,000 vulnerabilities in the project’s 8-year history.
However, these specific vulnerabilities mark a significant achievement in automated vulnerability discovery: each was detected using Artificial Intelligence, employing AI-generated and improved fuzz targets. The OpenSSL CVE represents one of the initial weaknesses in crucial software identified by LLMs, providing another concrete instance following Google’s recent revelation of a security loophole due to insufficient stack buffer in the widely utilized database engine SQLite.
This blog article delves into the outcomes and insights gained over eighteen months of work to incorporate AI-driven fuzzing up to this juncture, including both integrating AI into the fuzz target generation and broadening this to mirror a developer’s workflow. These endeavors continue our investigations into how AI can revolutionize the detection of vulnerabilities and fortify the defenses for all.
The current progress
In August 2023, the OSS-Fuzz team revealed AI-Driven Fuzz Testing, outlining our mission to use large language models (LLM) to enhance fuzz testing coverage for discovering more vulnerabilities.automatically—prior to potential exploitation by malicious attackers. Our strategy involved leveraging the programming capabilities of an LLM to produce additional fuzz targets, akin to unit tests that assess pertinent functionality in order to hunt for vulnerabilities.
The optimal resolution would involve completely automating the manual procedure of creating an end-to-end fuzz target:
-
Creating an initial fuzz target.
-
Rectifying any compilation problems that emerge.
-
Executing the fuzz target to evaluate its performance, and resolving any evident errors causing runtime glitches.
-
Running the enhanced fuzz target for an extended period, and analyzing any crashes to identify the underlying issue.
-
Addressing vulnerabilities.
In August 2023, we highlighted our initiatives utilizing an LLM to manage the initial two phases. We successfully harnessed an Repetitive process for creating a fuzz target with a straightforward prompt that includes fixed examples and compilation errors.
By January 2024, we publicly released the platform that we had been constructing to empower an LLM in producing fuzz targets. At that juncture, LLMs were consistently creating targets that covered more intriguing code across 160 projects. However, there remained a significant number of projects where we were unsuccessful in obtaining even a single functional AI-generated fuzz target.
To tackle this, we have enhanced the initial two stages and have also integrated steps 3 and 4.
Recent outcomes: Enhanced code coverage and identified vulnerabilities
We can now automatically achieve broader coverage in 272 C/C++ projects on OSS-Fuzz (from 160), introducing over 370k lines of new code coverage. The most significant coverage enhancement in a single project saw an escalation from 77 lines to 5434 lines (a 7000% growth).
This led to the identification of 26 new security loopholes within initiatives on OSS-Fuzz that previously underwent hundreds of thousands of hours of fuzzing. The standout feature is CVE-2024-9143 in the crucial and extensively tested OpenSSL library. We notified about this loophole on September 16 and a solution was released on October 16. Our observation suggests that this vulnerability has probably existed for two decades and would not have been detectable using current human-crafted fuzz targets.
Another instance was a glitch in the cJSON project, where although an existing human-made harness was available to fuzz a particular operation, we still detected a new vulnerability in that same operation with an AI-generated target.
One explanation for why such glitches could remain undetected for extended periods is that statement coverage does not ensure that a function is devoid of bugs. Code coverage as a gauge cannot examine all feasible code pathways and states—varied flags and settings may trigger varied responses, exposing different glitches. These cases emphasize the necessity to continuously produce new types of fuzz targets even for code that is already fuzzed, as has also been exhibited by Project Zero in the past (1, 2).
Fresh enhancements
To accomplish these outcomes, we have been concentrating on two key enhancements:
-
Automatically create more pertinent context in our prompts. Providing a more extensive and applicable enlightenment to the LLM regarding a project, decreases the chances of it inventing the absent specifics in its reply. This involved supplying more precise, project-specific context in prompts, like purpose, type explanations, cross references, and current unit tests for each project. In order to automatically generate this information, we constructed new infrastructure to categorize projects across OSS-Fuzz.
-
The LLMs turned out to be incredibly efficient at imitating a typical developer’s complete process of scripting, testing, and refining the fuzz target, as well as assessing the detected crashes. Due to this, automating additional portions of the fuzzing process became viable. This additional cyclical feedback consequently led to enhanced quality and a higher number of accurate fuzz targets.
The progression in motion
Our LLM is now capable of executing the initial four steps of a developer’s procedure (with the fifth on the horizon).
A developer could inspect the original code, current documentation and unit tests, as well as applications of the target function when formulating an initial fuzz target. An LLM can fulfill this function in this scenario, provided we give a prompt with this information and request it to devise a fuzz target.
Upon selecting a potential target, a software developer will proceed with compilingplease review it for any compilation problems that emerge. Once again, an LLM can be alerted with specific details on the compilation errors for resolution.
Instance of compilation errors that an LLM was capable of rectifying
After addressing all compilation errors, a developer would test the fuzz target for a brief duration to check for any errors that might cause an instant crash, indicating an issue with the target rather than a bug found in the project.
The below is an instance of an LLM rectifying a semantic problem with the fuzzing setup:
4. Executing the rectified fuzz target for an extended period of time, and analyzing any crashes.
At this juncture, the fuzz target is prepared to be executed for an elongated duration on a suitable fuzzing infrastructure, like ClusterFuzz.
Any detected crashes would then require assessment, to ascertain the underlying causes and whether they indicate valid vulnerabilities (or issues in the fuzz target). An LLM can be activated with the pertinent context (stacktraces, fuzz target source code, relevant project source code) for conducting this evaluation.
In this situation, the LLM correctly identifies this as a defect in the fuzz target, rather than an issue in the project under assessment.
5. Repairing weaknesses.
The objective is to fully automate the complete process by allowing the LLM to propose a recommended fix for the vulnerability. Presently, there are no details to disclose, but we are cooperating with a variety of researchers to realize this vision and are eager to publish outcomes shortly.
Coming up
Enhancing automated triage: to achieve a point where confidence is high in not necessitating human evaluation. This automation will facilitate the automatic notification of new vulnerabilities to project maintainers. It is probable that more than the 26 vulnerabilities already forwarded upstream are lurking in our findings.
Framework based on agents: which involves empowering the LLM to independently strategize the steps required to address a specific issue by granting it access to resources for acquiring additional information and verifying outcomes. Through offering the LLM interactive access to actual resources like debuggers, it has been observed that the LLM is more inclined towards generating accurate results.
Incorporating our investigation into OSS-Fuzz as a characteristic: to realize a more completely automated end-to-end resolution for identifying vulnerabilities and implementing fixes. It is anticipated that OSS-Fuzz will serve as a valuable resource for other researchers to analyze AI-driven methods for detecting vulnerabilities and ultimately evolve as a tool that will empower defenders to uncover more vulnerabilities before they are exploited.
For more details, explore our open source platform at oss-fuzz-gen. Our aim is to further collaborate in this domain with fellow researchers. Also, do not forget to visit the OSS-Fuzz blogfor additional technical updates.
