Snowcat: Efficient Kernel Concurrency Testing using a Learned Coverage Predictor
Abstract
Random-based approaches and heuristics are commonly
used in kernel concurrency testing due to the massive scale
of modern kernels and corresponding interleaving space.
The lack of accurate and scalable approaches to analyze concurrent
kernel executions makes existing testing approaches
heavily rely on expensive dynamic executions to measure
the effectiveness of a new test. Unfortunately, the high cost
incurred by dynamic executions limits the breadth of the
exploration and puts latency pressure on finding effective
concurrent test inputs and schedules, hindering the overall
testing effectiveness.
This paper proposes Snowcat, a kernel concurrency testing
framework that generates effective test inputs and schedules
using a learned kernel block-coverage predictor. Using a
graph neural network, the coverage predictor takes a concurrent
test input and scheduling hints and outputs a prediction
on whether certain important code blocks will be executed.
Using this predictor, Snowcat can skip concurrent tests that
are likely to be fruitless and prioritize the promising ones
for actual dynamic execution.
After testing the Linux kernel for over a week, Snowcat
finds ∼17% more potential data races, by prioritizing tests of
more fruitful schedules than existing work would have chosen.
Snowcat can also find effective test inputs that expose
new concurrency bugs with higher probability (1.4×∼2.6×),
or reproduce known bugs more quickly (15×) than state-ofart
testing tools. More importantly, Snowcat is shown to be
more efficient at reaching a desirable level of race coverage
in the continuous setting, as the Linux kernel evolves from
version to version. In total, Snowcat discovered 17 new concurrency
bugs in Linux kernel 6.1, of which 13 are confirmed
and 6 are fixed.
used in kernel concurrency testing due to the massive scale
of modern kernels and corresponding interleaving space.
The lack of accurate and scalable approaches to analyze concurrent
kernel executions makes existing testing approaches
heavily rely on expensive dynamic executions to measure
the effectiveness of a new test. Unfortunately, the high cost
incurred by dynamic executions limits the breadth of the
exploration and puts latency pressure on finding effective
concurrent test inputs and schedules, hindering the overall
testing effectiveness.
This paper proposes Snowcat, a kernel concurrency testing
framework that generates effective test inputs and schedules
using a learned kernel block-coverage predictor. Using a
graph neural network, the coverage predictor takes a concurrent
test input and scheduling hints and outputs a prediction
on whether certain important code blocks will be executed.
Using this predictor, Snowcat can skip concurrent tests that
are likely to be fruitless and prioritize the promising ones
for actual dynamic execution.
After testing the Linux kernel for over a week, Snowcat
finds ∼17% more potential data races, by prioritizing tests of
more fruitful schedules than existing work would have chosen.
Snowcat can also find effective test inputs that expose
new concurrency bugs with higher probability (1.4×∼2.6×),
or reproduce known bugs more quickly (15×) than state-ofart
testing tools. More importantly, Snowcat is shown to be
more efficient at reaching a desirable level of race coverage
in the continuous setting, as the Linux kernel evolves from
version to version. In total, Snowcat discovered 17 new concurrency
bugs in Linux kernel 6.1, of which 13 are confirmed
and 6 are fixed.