# Andrew Tomkins

Personal page with copies of most papers: http://www.tomkins.family/andrew

Authored Publications

Google Publications

Other Publications

Sort By

An Adversarial Variational Inference Approach for Travel Demand Calibration of Urban Traffic Simulators

Martin Mladenov

Proceedings of the 30th ACM SIGSPATIAL Intl. Conf. on Advances in Geographic Information Systems (SIGSPATIAL-22), Seattle, WA (2022) (to appear)

Preview abstract
This paper considers the calibration of travel demand inputs, defined as a set of origin-destination matrices (ODs), for stochastic microscopic urban traffic simulators. The goal of calibration is to find a (set of) travel demand input(s) that replicate sparse field count data statistics. While traditional approaches use only first-order moment information from the field data, it is well known that the OD calibration problem is underdetermined in realistic networks. We study the value of using higher-order statistics from spatially sparse field data to mitigate underdetermination, proposing a variational inference technique that identifies an OD distribution. We apply our approach to a high-dimensional setting in Salt Lake City, Utah. Our approach is flexibleâ€”it can be readily extended to account for arbitrary types of field data (e.g., road, path or trip data).
View details

An Efficient Simulation-Based Travel Demand Calibration Algorithm for Large-Scale Metropolitan Traffic Models

Yechen Li

Yi-fan Chen

Ziheng Lin

(2021) (to appear)

Preview abstract
Metropolitan scale vehicular traffic modeling is used by a variety of private and public sector urban mobil-ity stakeholders to inform the design and operations of road networks. High-resolution stochastic traffic simulators are increasingly used to describe detailed demand-supply interactions. The design of efficient calibration techniques remains a major challenge. This paper considers a class of high-dimensional calibration problems known as origin-destination (OD) calibration. We formulate the problem as a continuous simulation-based optimization problem. Our proposed algorithm builds upon recent metamodel methods that tackle the simulation-based problem by solving a sequence of approximate analytical optimization problems, which rely on the use of analytical network models. In this paper, we formulate a network model defined as a system of linear equations, the dimension of which scales linearly with the number of roads in the network and independently of the dimension of the route choice set. This makes the approach suitable for large-scale metropolitan networks. The approach has enhanced efficiency compared with past metamodel formulations that are based on systems of nonlinear, rather than linear, equations. It also has enhanced efficiency compared to traditional calibration methods that resort to simulation-based estimates of traffic assignment matrices, while the proposed approach uses analytical approximations of these matrices. We benchmark the approach considering a peak period Salt Lake City case study and calibrate based on field vehicular count data. The new formulation yields solutions with good performance, reduces the compute time needed, is suitable for large-scale road networks, and can be readily extended to account for other types of field data sources.
View details

Quantifying the sustainability impact of Google Maps: A case study of Salt Lake City

Theophile Cabannes

Yechen Li

Preston McAfee

2021 (2021) (to appear)

Preview abstract
Google Maps uses current and historical traffic trends to provide routes to drivers. In this paper, we use microscopic traffic simulation to quantify the improvements to both travel time and CO2 emissions from Google Maps real-time navigation. A case study in Salt Lake City shows that Google Maps users are, on average, saving 1.7% of CO2 emissions and 6.5% travel time. If we restrict to the users for which Google Maps finds a different route than their original route, the average savings are 3.4% of CO2 emissions and 12.5% of travel time. These results are based on traffic conditions observed during the Covid-19 pandemic. As congestion gradually builds back up to pre-pandemic levels, it is expected to lead to even greater savings in emissions.
View details

Graph-RISE: Graph-Regularized Image Semantic Embedding

Aleksei Timofeev

Futang Peng

Krishnamurthy Viswanathan

Lucy Gao

Sujith Ravi

Yi-ting Chen

Zhen Li

The 12th International Conference on Web Search and Data Mining (2020) (to appear)

Preview abstract
Learning image representation to capture instance-based semantics has been a challenging and important task for enabling many applications such as image search and clustering. In this paper, we explore the limits of image embedding learning at unprecedented scale and granularity. We present Graph-RISE, an image embedding that captures very fine-grained, instance-level semantics. Graph-RISE is learned via a large-scale, neural graph learning framework that leverages graph structure to regularize the training of deep neural networks. To the best of our knowledge, this is the first work that can capture instance-level image semantics at millionâ€”O(40M)â€”scale. Experimental results show that Graph-RISE outperforms state-of-the-art image embedding algorithms on several evaluation tasks, including image classification and triplet ranking. We also provide case studies to demonstrate that, qualitatively, image retrieval based on Graph-RISE well captures the semantics and differentiates nuances at instance level.
View details

Graph Agreement Models for Semi-supervised Learning

Krishnamurthy Viswanathan

Anthony Platanios

Sujith Ravi

Proceedings of the Thirty-third Conference on Neural Information Processing Systems, Neurips 2019

Preview abstract
Graph-based algorithms are among the most successful paradigms for solving semi-supervised learning tasks. Recent work on graph convolutional networks and neural graph learning methods has successfully combined the expressiveness of neural networks with graph structures. We propose a technique that, when applied to these methods, achieves state-of-the-art results on semi-supervised learning datasets. Traditional graph-based algorithms, such as label propagation, were designed with the underlying assumption that the label of a node can be imputed from that of the neighboring nodes. However, real-world graphs are either noisy or have edges that do not correspond to label agreement. To address this, we propose Graph Agreement Models (GAM), which introduces an auxiliary model that predicts the probability of two nodes sharing the same label as a learned function of their features. The agreement model is used when training a node classification model by encouraging agreement only for the pairs of nodes it deems likely to have the same label, thus guiding its parameters to better local optima. The classification and agreement models are trained jointly in a co-training fashion. Moreover, GAM can also be applied to any semi-supervised classification problem, by inducing a graph whenever one is not provided. We demonstrate that our method achieves a relative improvement of up to 72% for various node classification models, and obtains state-of-the-art results on multiple established datasets.
View details

Orienteering Algorithms for Generating Travel Itineraries

Zachary Friggstad

Chaitanya Swamy

WSDM (2018)

Preview abstract
We study the problem of automatically and efficiently generating itineraries
for users who are on vacation. We focus on the common case, wherein the trip
duration is more than a single day. Previous efficient algorithms based on
greedy heuristics suffer from two problems. First, the itineraries are often
unbalanced, with excellent days visiting top attractions followed by days of
exclusively lower-quality alternatives. Second, the trips often re-visit
neighborhoods repeatedly in order to cover increasingly low-tier points of
interest. Our primary technical contribution is an algorithm that addresses
both these problems by maximizing the quality of the worst day. We give
theoretical results showing that this algorithm's competitive factor is within
a factor two of the guarantee of the best available algorithm for a single day,
across many variations of the problem. We also give detailed empirical
evaluations using two distinct datasets: (a) anonymized Google historical visit
data and (b) Foursquare public check-in data. We show first that the overall
utility of our itineraries is almost identical to that of algorithms
specifically designed to maximize total utility, while the utility of the worst
day of our itineraries is roughly twice that obtained from other approaches.
We then turn to evaluation based on human raters who score our itineraries only
slightly below the itineraries created by human travel experts with deep
knowledge of the area.
View details

Smart Reply: Automated Response Suggestion for Email

Karol Kurach

Sujith Ravi

Tobias Kaufman

Laszlo Lukacs

Peter Young

Vivek Ramavajjala

Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) (2016).

Preview abstract
In this paper we propose and investigate a novel end-to-end method for automatically generating short email responses, called Smart Reply. It generates semantically diverse suggestions that can be used as complete email responses with just one tap on mobile. The system is currently used in Inbox by Gmail and is responsible for assisting with 10% of all mobile responses. It is designed to work at very high throughput and process hundreds of millions of messages daily. The system exploits state-of-the-art, large-scale deep learning.
We describe the architecture of the system as well as the challenges that we faced while building it, like response diversity and scalability. We also introduce a new method for semantic clustering of user-generated content that requires only a modest amount of explicitly labeled data.
View details

Arrival and departure in Social Networks

Preview
Shaomei Wu

Atish Das Sarma

Sixth ACM International Conference on Web Search and Data Mining, WSDM 2013

Online Selection of Diverse Results

Debmalya Panigrahi

Atish Das Sarma

Proceedings of the 5th ACM international Conference on Web Search and Data Mining (2012), pp. 263-272

Preview abstract
The phenomenal growth in the volume of easily accessible information via various web-based services has made it essential for service providers to provide users with personalized representative summaries of such information. Further, online commercial services including social networking and micro-blogging websites, e-commerce portals, leisure and entertainment websites, etc. recommend interesting content to users that is simultaneously diverse on many different axes such as topic, geographic specificity, etc.
The key algorithmic question in all these applications is the generation of a succinct, representative, and relevant summary from a large stream of data coming from a variety of sources. In this paper, we formally model this optimization problem, identify its key structural characteristics, and use these observations to design an extremely scalable and efficient algorithm. We analyze the algorithm using theoretical techniques to show that it always produces a nearly optimal solution. In addition, we perform large-scale experiments on both real-world and synthetically generated datasets, which confirm that our algorithm performs even better than its analytical guarantees in practice, and also outperforms other candidate algorithms for the problem by a wide margin.
View details

Max-Cover in Map-Reduce

Flavio Chierichetti

Ravi Kumar

Proceedings of the 19th international conference on World Wide Web, ACM, Raleigh, North Carolina (2010), pp. 231-240

Preview abstract
The NP-hard Max-k-cover problem requires selecting k sets from a collection so as to maximize the size of the union. This classic problem occurs commonly in many settings in web search and advertising. For moderately-sized instances, a greedy algorithm gives an approximation of (1-1/e). However, the greedy algorithm requires updating scores of arbitrary elements after each step, and hence becomes intractable for large datasets.
We give the first max cover algorithm designed for today's large-scale commodity clusters. Our algorithm has provably almost the same approximation as greedy, but runs much faster. Furthermore, it can be easily expressed in the MapReduce programming paradigm, and requires only polylogarithmically many passes over the data. Our experiments on five large problem instances show that our algorithm is practical and can achieve good speedups compared to the sequential greedy algorithm.
View details

Stochastic Models for Tabbed Browsing

Preview
Flavio Chierichetti

Ravi Kumar

Proceedings of the 19th international conference on World Wide Web, ACM, Raleigh, North Carolina (2010), pp. 241-250

Dense Subgraph Extraction

Preview
David Gibson

Ravi Kumar

Kevin S. McCurley

in: Mining Graph Data, John Wiley & Sons (2006), pp. 411-441

Evolution of two-sided markets

Stochastic models for tabbed browsing

Max-cover in map-reduce

A characterization of online browsing behavior

Search is dead!: long live search

Elizabeth F. Churchill

Marti Hearst

Barney Pell

WWW (2010), pp. 1337-1338

ShatterPlots: Fast Tools for Mining Large Graphs

Ana Paula Appel

Deepayan Chakrabarti

Christos Faloutsos

Ravi Kumar

Jure Leskovec

SDM (2009), pp. 802-813

A translation model for matching reviews to objects

Matching Reviews to Objects using a Language Model

A Characterization of Online Search Behavior

For a few dollars less: Identifying review pages sans human labels

A web of concepts

Nilesh N. Dalvi

Ravi Kumar

Bo Pang

Raghu Ramakrishnan

Philip Bohannon

Sathiya Keerthi

Srujana Merugu

PODS (2009), pp. 1-12

An analysis framework for search sequences

Preferential behavior in online groups

Pig latin: a not-so-foreign language for data processing

Christopher Olston

Benjamin Reed

Utkarsh Srivastava

Ravi Kumar

SIGMOD Conference (2008), pp. 1099-1110

Social networks: looking ahead

Ravi Kumar

Alexander Tuzhilin

Christos Faloutsos

David Jensen

Gueorgi Kossinets

Jure Leskovec

KDD (2008), pp. 1060

Relaxation in text search using taxonomies

Marcus Fontoura

Vanja Josifovski

Ravi Kumar

Christopher Olston

PVLDB, vol. 1 (2008), pp. 672-683

Connectivity structure of bipartite graphs via the KNC-plot

Microscopic evolution of social networks

Efficient Discovery of Authoritative Resources

Vanity fair: privacy in querylog bundles

"I know what you did last summer": query logs and user privacy

Anchor-based proximity measures

The discoverability of the web

Anirban Dasgupta

Arpita Ghosh

Ravi Kumar

Christopher Olston

Sandeep Pandey

WWW (2007), pp. 421-430

Visualizing tags over time

Micah Dubinko

Ravi Kumar

Joseph Magnani

Jasmine Novak

TWEB, vol. 1 (2007)

On anonymizing query logs via token-based hashing

Hierarchical topic segmentation of websites

Visualizing tags over time

Micah Dubinko

Ravi Kumar

Joseph Magnani

Jasmine Novak

WWW (2006), pp. 193-202

Estimating corpus size via queries

Marcus Fontoura

Vanja Josifovski

Ravi Kumar

Rajeev Motwani

Shubha U. Nabar

Rina Panigrahy

Ying Xu 0002

CIKM (2006), pp. 594-603

Content, Metadata, and Behavioral Information: Directions for Yahoo! Research

Navigating Low-Dimensional and Hierarchical Population Networks

Core algorithms in the CLEVER system

Ravi Kumar

Sridhar Rajagopalan

ACM Trans. Internet Techn., vol. 6 (2006), pp. 131-152

Evolutionary clustering

Structure and evolution of online social networks

Multi-structural databases

Efficient Implementation of Large-Scale Multi-Structural Databases

Ronald Fagin

Phokion G. Kolaitis

Ravi Kumar

Jasmine Novak

D. Sivakumar

VLDB (2005), pp. 958-969

On the Bursty Evolution of Blogspace

Ravi Kumar

Jasmine Novak

World Wide Web, vol. 8 (2005), pp. 159-178

The predictive power of online chatter

Discovering Large Dense Subgraphs in Massive Graphs

Variable latent semantic indexing

Minimizing Wirelength in Zero and Bounded Skew Clock Trees

Moses Charikar

Jon M. Kleinberg

Ravi Kumar

Sridhar Rajagopalan

Amit Sahai

SIAM J. Discrete Math., vol. 17 (2004), pp. 582-595

Sic transit gloria telae: towards an understanding of the web's decay

Fast discovery of connection subgraphs

Propagation of trust and distrust

Structure and evolution of blogspace

Mining and Knowledge Discovery from the Web

Anti-aliasing on the web

On the bursty evolution of blogspace

A case for automated large-scale semantic annotation

Stephen Dill

Nadav Eiron

David Gibson

Daniel Gruhl

Anant Jhingran

Tapas Kanungo

Kevin S. McCurley

Sridhar Rajagopalan

John A. Tomlin

Jason Y. Zien

J. Web Sem., vol. 1 (2003), pp. 115-132

SemTag and seeker: bootstrapping the semantic web via automated semantic annotation

Stephen Dill

Nadav Eiron

David Gibson

Daniel Gruhl

Anant Jhingran

Tapas Kanungo

Sridhar Rajagopalan

John A. Tomlin

Jason Y. Zien

WWW (2003), pp. 178-186

The Web and Social Networks

Ravi Kumar

Sridhar Rajagopalan

IEEE Computer, vol. 35 (2002), pp. 32-36

Self-similarity in the web

Stephen Dill

Ravi Kumar

Kevin S. McCurley

Sridhar Rajagopalan

D. Sivakumar

ACM Trans. Internet Techn., vol. 2 (2002), pp. 205-223

Self-similarity in the Web

Stephen Dill

Ravi Kumar

Kevin S. McCurley

Sridhar Rajagopalan

D. Sivakumar

VLDB (2001), pp. 69-78

On Semi-Automated Web Taxonomy Construction

Recommendation Systems: A Probabilistic Analysis

Ravi Kumar

Sridhar Rajagopalan

J. Comput. Syst. Sci., vol. 63 (2001), pp. 42-61

Graph structure in the Web

Ravi Kumar

Farzin Maghoul

Sridhar Rajagopalan

Raymie Stata

Janet L. Wiener

Computer Networks, vol. 33 (2000), pp. 309-320

Random walks with ``back buttons'' (extended abstract)

Ronald Fagin

Anna R. Karlin

Jon M. Kleinberg

Sridhar Rajagopalan

Ronitt Rubinfeld

Madhu Sudan

STOC (2000), pp. 484-493

Random graph models for the web graph

Ravi Kumar

Sridhar Rajagopalan

D. Sivakumar

Eli Upfal

FOCS (2000), pp. 57-65

The Web as a Graph

Ravi Kumar

Sridhar Rajagopalan

D. Sivakumar

Eli Upfal

PODS (2000), pp. 1-10

Mining the Web's Link Structure

Soumen Chakrabarti

Byron Dom

Ravi Kumar

Sridhar Rajagopalan

David Gibson

Jon M. Kleinberg

IEEE Computer, vol. 32 (1999), pp. 60-67

Topic Distillation and Spectral Filtering

Soumen Chakrabarti

Byron Dom

David Gibson

Ravi Kumar

Sridhar Rajagopalan

Artif. Intell. Rev., vol. 13 (1999), pp. 409-435

Trawling the Web for Emerging Cyber-Communities

Ravi Kumar

Sridhar Rajagopalan

Computer Networks, vol. 31 (1999), pp. 1481-1493

The Web as a Graph: Measurements, Models, and Methods

Jon M. Kleinberg

Ravi Kumar

Sridhar Rajagopalan

COCOON (1999), pp. 1-17

Minimizing Wirelength in Zero and Bounded Skew Clock Trees

Moses Charikar

Jon M. Kleinberg

Ravi Kumar

Sridhar Rajagopalan

Amit Sahai

SODA (1999), pp. 177-184

On targeting Markov segments

Moses Charikar

Ravi Kumar

Sridhar Rajagopalan

STOC (1999), pp. 99-108

Extracting Large-Scale Knowledge Bases from the Web

Recommendation Systems: A Probabilistic Analysis

A Trace-Driven Comparison of Algorithms for Parallel Prefetching and Caching

Tracy Kimbrel

R. Hugo Patterson

Brian N. Bershad

Pei Cao

Edward W. Felten

Garth A. Gibson

Anna R. Karlin

Kai Li

OSDI (1996), pp. 19-34