Understanding Host Interconnect Congestion

Khaled Elmeleegy
Masoud Moshref
Rachit Agarwal
Saksham Agarwal
Sylvia Ratnasamy
Association for Computing Machinery, New York, NY, USA (2022), 198–204


We present evidence and characterization of host congestion in production clusters: adoption of high-bandwidth access links leading to emergence of bottlenecks within the host interconnect (NIC-to-CPU data path). We demonstrate that contention on existing IO memory management units and/or the memory subsystem can significantly reduce the available NIC-to-CPU bandwidth, resulting in hundreds of microseconds of queueing delays and eventual packet drops at hosts (even when running a state-of-the-art congestion control protocol that accounts for CPU-induced host congestion). We also discuss implications of host interconnect congestion to design of future host architecture, network stacks and network protocols.

