Evaluating static analysis tools for detecting buffer overflows in C code
Summary
This project evaluated five static analysis tools using a diagnostic test suite to determine their strengths and weaknesses in detecting a variety of buffer overflow flaws in C code. Detection, false alarm, and confusion rates were measured, along with execution time. PolySpace demonstrated a superior detection rate on the basic test suite, missing only one out of a possible 291 detections. It may benefit from improving its treatment of signal handlers, and reducing both its false alarm rate (particularly for C library functions) and execution time. ARCHER performed quite well with no false alarms whatsoever; a few key enhancements, such as in its inter-procedural analysis and handling of C library functions, would boost its detection rate and should improve its performance on real-world code. Splint detected significantly fewer overflows and exhibited the highest false alarm rate. Improvements in its loop handling, and reductions in its false alarm rate would make it a much more useful tool. UNO had no false alarms, but missed a broad variety of overflows amounting to nearly half of the possible detections in the test suite. It would need improvement in many areas to become a very useful tool. BOON was clearly at the back of the pack, not even performing well on the subset of test cases where it could have been expected to function. The project also provides a buffer overflow taxonomy, along with a test suite generator and other tools, that can be used by others to evaluate code analysis tools with respect to buffer overflow detection.