- Fabio Soldo
- Ahmed Metwally
In this paper we present a data-driven framework for detecting machine-generated traffic based on the IP size, i.e., the number of users sharing the same source IP. Our main observation is that diverse machine-generated traffic attacks share a common characteristic: they induce an anomalous deviation from the expected IP size distribution. We develop a principled framework that automatically detects and classifies these deviations using statistical tests and ensemble learning. We evaluate our approach on a massive dataset collected at Google for 90 consecutive days. We argue that our approach combines desirable characteristics: it can accurately detect fraudulent machine-generated traffic; it is based on a fundamental characteristic of these attacks and is thus robust (e.g., to DHCP re-assignment) and hard to evade; it has low complexity and is easy to parallelize, making it suitable for large-scale detection; and finally, it does not entail profiling users, but leverages only aggregate statistics of network traffic.