And the award goes to...

April 5, 2017

Posted by Evgeniy Gabrilovich, Senior Staff Research Scientist, Google Research, and WWW 2017 Technical Program Co-Chair



Today, Google's Andrei Broder, Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew Tomkins, along with their coauthors, Farzin Maghoul, Raymie Stata, and Janet Wiener, have received the prestigious 2017 Seoul Test of Time Award for their classic paper “Graph Structure in the Web”. This award is given to the authors of a previous World Wide Web conference paper that has demonstrated significant scientific, technical, or social impact over the years. The first award, introduced in 2015, was given to Google founders Larry Page and Sergey Brin.

Originally presented in 2000 at the 9th WWW conference in Amsterdam, “Graph Structure in the Web” represents the seminal study of the structure of the World Wide Web. At the time of publication, it received the Best Paper Award from the WWW conference, and in the following 17 years proved to be highly influential, accumulating over 3,500 citations.

The paper made two major contributions to the study of the structure of the Internet. First, it reported the results of a very large scale experiment to confirm that the indegree of Web nodes is distributed according to a power law. To wit, the probability that a node of the Web graph has i incoming links is roughly proportional to 1/i2.1. Second, in contrast to previous research that assumed the Web to be almost fully connected, “Graph Structure in the Web” described a much more elaborate structure of the Web, which since then has been depicted with the iconic “bowtie” shape:
Original “bowtie” schematic from “Graph Structure in the Web”
The authors presented a refined model of the Web graph, and described several characteristic classes of Web pages:
  • the strongly connected core component, where each page is reachable from any other page,
  • the so-called IN and OUT clusters, which only have unidirectional paths to or from the core,
  • tendrils dangling from the two clusters, and tubes connecting the clusters while bypassing the core, and finally
  • disconnected components, which are isolated from the rest of the graph.
Whereas the core component is fully connected and each node can be reached from any other node, Broder et al. discovered that as a whole the Web is much more loosely connected than previously believed, while the probability that any two given pages can be reached from one another is just under 1/4.
Ravi Kumar, presenting the original paper in Amsterdam at WWW 2000
Curiously, the original study was done back in 1999 on two Altavista crawls having 200 million pages and 1.5 billion links. Today, Google indexes over 100 billion links merely within apps, and overall processes over 130 trillion web addresses in its web crawls.

Over the years, the power law was found to be characteristic of many other Web-related phenomena, including the structure of social networks and the distribution of search query frequencies. The description of the macroscopic structure of the Web graph proposed by Broder et al. provided a solid mathematical foundation for numerous subsequent studies on crawling and searching the Web, which profoundly influenced the architecture of modern search engines.

Hearty congratulations to all the authors on the well-deserved award!