On the Reliability of Coverage-Based Fuzzer Benchmarking

Marcel Böhme; László Szekeres; Jonathan Metzman

On the Reliability of Coverage-Based Fuzzer Benchmarking

Marcel Böhme

László Szekeres

Jonathan Metzman

Proceedings of the 44th International Conference on Software Engineering (ICSE'22), IEEE (2022)

Download Google Scholar

Abstract

Given a program where none of our fuzzers finds any bugs, how do we know which fuzzer is better? In practice, we often look to code coverage as a proxy measure of fuzzer effectiveness and consider the fuzzer which achieves more coverage as the better one.
Indeed, evaluating 10 fuzzers for 23 hours on 24 programs, we find that a fuzzer that covers more code also finds more bugs. There is a very strong correlation between the coverage achieved and the number of bugs found by a fuzzer. Hence, it might seem reasonable to compare fuzzers in terms of coverage achieved, and from that derive empirical claims about a fuzzer’s superiority at finding bugs.
Curiously enough, however, we find no strong agreement on which fuzzer is superior if we compared multiple fuzzers in terms of coverage achieved instead of the number of bugs found. The fuzzer best at achieving coverage, may not be best at finding bugs.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

On the Reliability of Coverage-Based Fuzzer Benchmarking

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs