Sridhar Srinivasan
Research Areas
Authored Publications
Sort By
LatLong: Diagnosing Wide-Area Latency Changes for CDNs
Yaping Zhu
Benjamin Helsley
Jennifer Rexford
IEEE Transactions on Network and Service Management, 9 (2012) (to appear)
Preview abstract
Minimizing user-perceived latency is crucial for Content Distribution Networks (CDNs) hosting interactive services. Latency may increase for many reasons, such as interdomain routing changes and the CDN's own load-balancing policies. CDNs need greater visibility into the causes of latency increases, so they can adapt by directing traffic to different servers or paths. In this paper, we propose techniques for CDNs to diagnose large latency increases, based on passive measurements of performance, traffic, and routing. Separating the many causes from the effects is challenging. We propose a decision tree for classifying
latency changes, and determine how to distinguish traffic shifts from increases in latency for existing servers, routers, and paths. Another challenge is that network operators group related clients to reduce measurement and control overhead, but the clients in a region may use multiple servers and paths during a measurement interval. We propose metrics that quantify the latency contributions
across sets of servers and routers. Analyzing a month of data from Google's CDN, we find that nearly 1% of the
daily latency changes increase delay by more than 100 msec. More than 40% of these increases coincide with interdomain routing changes, and more than one-third involve a shift in traffic to different servers. This is the first work to diagnose latency problems in a large, operational CDN from purely passive measurements. Through case studies of individual events, we identify research challenges for measuring and managing wide-area latency for CDNs.
View details
Moving Beyond End-to-End Path Information to Optimize CDN Performance
Harsha V. Madhyastha
Sushant Jain
Arvind Krishnamurthy
Thomas Anderson
Jie Gao
Internet Measurement Conference (IMC), ACM, Chicago, IL (2009), pp. 190-201
Preview abstract
Replicating content across a geographically distributed set of servers and redirecting clients to the closest server in terms of latency has emerged as a common paradigm for improving client performance. In this paper, we analyze latencies measured from servers in Google’s content distribution network (CDN) to clients all across the Internet to study the effectiveness of latency-based server selection. Our main result is that redirecting every client to the server with least latency does not suffice to optimize client latencies. First, even though most clients are served by a geographically nearby CDN node, a sizeable fraction of clients experience latencies several tens of milliseconds higher than other clients in the same region. Second, we find that queueing delays often override the benefits of a client interacting with a nearby server.
To help the administrators of Google’s CDN cope with these
problems, we have built a system called WhyHigh. First, WhyHigh measures client latencies across all nodes in the CDN and correlates measurements to identify the prefixes affected by inflated latencies. Second, since clients in several thousand prefixes have poor latencies, WhyHigh prioritizes problems based on the impact that solving them would have, e.g., by identifying either an AS path common
to several inflated prefixes or a CDN node where path inflation is widespread. Finally, WhyHigh diagnoses the causes for inflated latencies using active measurements such as traceroutes and pings, in combination with datasets such as BGP paths and flow records. Typical causes discovered include lack of peering, routing misconfigurations, and side-effects of traffic engineering. We have used WhyHigh to diagnose several instances of inflated latencies, and our efforts over the course of a year have significantly helped improve the performance offered to clients by Google’s CDN.
An anonymized data set is available for download.
View details