Large-scale Automated Vulnerability Addition
A major challenge facing computer scientists is the detection of errors, or bugs, in software code. While software developers continually design new tools to find bugs, the lack of ground truth, or documentation on known bugs and their manifestation in a program, makes it difficult to measure the success of bug-finding tools. For example, if a tool finds 42 bugs in a program, there is no way to know whether that number represents 99 percent or 1 percent of the total number of bugs actually in the program.
We developed the Large-scale Automated Vulnerability Addition (LAVA) system to enable computer scientists to test their techniques for finding bugs against a large body of ground-truth data. The LAVA system works by automatically injecting millions of realistic bugs into program code. Once these bugs are injected, vulnerability discovery techniques can be tested to see how many of the LAVA bugs they find and how many they miss. Although artificial, the bugs introduced by LAVA are realistic in the sense that they are embedded deep within programs and are triggered by real inputs. In a preliminary test, two bug-finding tools found less than 2 percent of the LAVA bugs injected into a program, indicating a huge potential for the improvement of vulnerability discovering tools.