Google Research

Traffic Anomaly Detection Based on the IP Size Distribution

  • Fabio Soldo
  • Ahmed Metwally
INFOCOM International Conference on Computer Communications, Joint Conference of the IEEE Computer and Communications Societies, IEEE (2012), pp. 2005-2013


In this paper we present a data-driven framework for detecting machine-generated traffic based on the IP size, i.e., the number of users sharing the same source IP. Our main observation is that diverse machine-generated traffic attacks share a common characteristic: they induce an anomalous deviation from the expected IP size distribution. We develop a principled framework that automatically detects and classifies these deviations using statistical tests and ensemble learning. We evaluate our approach on a massive dataset collected at Google for 90 consecutive days. We argue that our approach combines desirable characteristics: it can accurately detect fraudulent machine-generated traffic; it is based on a fundamental characteristic of these attacks and is thus robust (e.g., to DHCP re-assignment) and hard to evade; it has low complexity and is easy to parallelize, making it suitable for large-scale detection; and finally, it does not entail profiling users, but leverages only aggregate statistics of network traffic.

Research Areas

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work