Optimal Probing with Statistical Guarantees for Network Monitoring at Scale

Branislav Kveton

Jehangir Amjad

Christophe Diot

Dimitris Konomis

Augustin Soule

Shawn Yang

Computer Communication, 192 (2022), pp. 119-131 (to appear)

Download Google Scholar

Abstract

Monitoring large-scale cloud networks is a complex task because their scale is prohibitively large, monitoring budgets are limited, network topologies are not entirely regular and the estimates produced are a function of traffic patterns. In this work, we take a statistical approach to estimating a network metric, such as the latency of a set of paths, with guarantees on the estimation error. We aim to do so in an intelligent and scalable manner, without observing all existing traffic, and minimizing the estimation error at a fixed probing budget per unit of time. Our proposed algorithms produce a distribution of probes/samples across network paths which can be used in conjunction with existing probers (or samplers). These algorithms are based on A- and E-optimal experimental designs in statistics, which guarantee a bounded estimation error for any monitoring budget. Unfortunately, these designs are too computationally intensive to be used in production at scale. We propose a scalable and near-optimal approximate implementations based on the Frank-Wolfe algorithm. We validate our approaches with two metrics (latency and loss) in simulations on real network topologies, and also using a production probing system in a real cloud network. We show major gains in reducing the probing budget compared to both production and academic baselines, while maintaining low errors in estimates, even with very low probing budgets.

Research Areas

Networking

Defining the technology of today and tomorrow.

Philosophy

People

Research areas

Foundational ML & Algorithms

Computing Systems & Quantum AI

Science, AI & Society

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Optimal Probing with Statistical Guarantees for Network Monitoring at Scale

Abstract

Research Areas

Meet the teams driving innovation