Jump to Content
Andrew Tomkins

Andrew Tomkins

Personal page with copies of most papers: http://www.tomkins.family/andrew
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    An Adversarial Variational Inference Approach for Travel Demand Calibration of Urban Traffic Simulators
    Martin Mladenov
    Proceedings of the 30th ACM SIGSPATIAL Intl. Conf. on Advances in Geographic Information Systems (SIGSPATIAL-22), Seattle, WA (2022) (to appear)
    Preview abstract This paper considers the calibration of travel demand inputs, defined as a set of origin-destination matrices (ODs), for stochastic microscopic urban traffic simulators. The goal of calibration is to find a (set of) travel demand input(s) that replicate sparse field count data statistics. While traditional approaches use only first-order moment information from the field data, it is well known that the OD calibration problem is underdetermined in realistic networks. We study the value of using higher-order statistics from spatially sparse field data to mitigate underdetermination, proposing a variational inference technique that identifies an OD distribution. We apply our approach to a high-dimensional setting in Salt Lake City, Utah. Our approach is flexible—it can be readily extended to account for arbitrary types of field data (e.g., road, path or trip data). View details
    Preview abstract Large generative language models such as GPT-2 are well-known for not only their ability to generate highly realistic text but also in their utility for common downstream tasks. However, how and in what settings one can best leverage these powerful language models is still a nascent research question. In this work, we explore their use in predicting ``language quality'', a notion of coherence and understandability of text. Our key finding is that, when trained in a self-discriminating fashion, large language models emerge as unsupervised predictors for such language quality. This enables fast bootstrapping of quality indicators in a low-resource setting. We conduct extensive qualitative and quantitative analysis over 500 million web articles, the largest-scale study conducted on this topic. View details
    Preview abstract Metropolitan scale vehicular traffic modeling is used by a variety of private and public sector urban mobil-ity stakeholders to inform the design and operations of road networks. High-resolution stochastic traffic simulators are increasingly used to describe detailed demand-supply interactions. The design of efficient calibration techniques remains a major challenge. This paper considers a class of high-dimensional calibration problems known as origin-destination (OD) calibration. We formulate the problem as a continuous simulation-based optimization problem. Our proposed algorithm builds upon recent metamodel methods that tackle the simulation-based problem by solving a sequence of approximate analytical optimization problems, which rely on the use of analytical network models. In this paper, we formulate a network model defined as a system of linear equations, the dimension of which scales linearly with the number of roads in the network and independently of the dimension of the route choice set. This makes the approach suitable for large-scale metropolitan networks. The approach has enhanced efficiency compared with past metamodel formulations that are based on systems of nonlinear, rather than linear, equations. It also has enhanced efficiency compared to traditional calibration methods that resort to simulation-based estimates of traffic assignment matrices, while the proposed approach uses analytical approximations of these matrices. We benchmark the approach considering a peak period Salt Lake City case study and calibrate based on field vehicular count data. The new formulation yields solutions with good performance, reduces the compute time needed, is suitable for large-scale road networks, and can be readily extended to account for other types of field data sources. View details
    Preview abstract Google Maps uses current and historical traffic trends to provide routes to drivers. In this paper, we use microscopic traffic simulation to quantify the improvements to both travel time and CO2 emissions from Google Maps real-time navigation. A case study in Salt Lake City shows that Google Maps users are, on average, saving 1.7% of CO2 emissions and 6.5% travel time. If we restrict to the users for which Google Maps finds a different route than their original route, the average savings are 3.4% of CO2 emissions and 12.5% of travel time. These results are based on traffic conditions observed during the Covid-19 pandemic. As congestion gradually builds back up to pre-pandemic levels, it is expected to lead to even greater savings in emissions. View details
    Preview abstract Work in information retrieval has traditionally been focused on ranking and relevance: for a user's query, fetch some number of results, ordered by relevance to the user. However, the problem of determining how many results to return, i.e. how to optimally truncate the ranked result list, has received far less attention despite being of critical importance in a range of applications. Such truncation is a balancing act between the overall relevance, or usefulness, of the results with the user cost of processing more results. In this work, we propose Choppy, an assumption-free model based on the widely successful Transformer architecture in NLP, to the ranked-list truncation problem. Needing nothing more than the relevance scores of the results, the model uses a powerful multi-head attention mechanism to directly optimize any user-defined target IR metric. We show Choppy improves upon recent, state-of-the-art baselines on Robust04. View details
    Preview abstract This paper seeks to develop a deeper understanding of the fundamental properties of neural text generations models. Concretely, the study of artifacts that emerge in machine generated text as a result of modeling choices is a nascent research area. To this end, the extent and degree to which these artifacts surface in generated text is still unclear. In the spirit of better understanding generative text models and their artifacts, we propose the new task of distinguishing which of several variants of a given model generated some piece of text. Specifically, we conduct an extensive suite of diagnostic tests to observe whether modeling choices (e.g., sampling methods, top-$k$ probabilities, model architectures, etc.) leave detectable artifacts in the text they generate. Our key finding, which is backed by a rigorous set of experiments, is that such artifacts are present and that different modeling choices can be inferred by looking at generated text alone. This suggests that neural text generators may actually be more sensitive to various modeling choices than previously thought. View details
    Graph-RISE: Graph-Regularized Image Semantic Embedding
    Aleksei Timofeev
    Futang Peng
    Krishnamurthy Viswanathan
    Lucy Gao
    Sujith Ravi
    Yi-ting Chen
    Zhen Li
    The 12th International Conference on Web Search and Data Mining (2020) (to appear)
    Preview abstract Learning image representation to capture instance-based semantics has been a challenging and important task for enabling many applications such as image search and clustering. In this paper, we explore the limits of image embedding learning at unprecedented scale and granularity. We present Graph-RISE, an image embedding that captures very fine-grained, instance-level semantics. Graph-RISE is learned via a large-scale, neural graph learning framework that leverages graph structure to regularize the training of deep neural networks. To the best of our knowledge, this is the first work that can capture instance-level image semantics at million—O(40M)—scale. Experimental results show that Graph-RISE outperforms state-of-the-art image embedding algorithms on several evaluation tasks, including image classification and triplet ranking. We also provide case studies to demonstrate that, qualitatively, image retrieval based on Graph-RISE well captures the semantics and differentiates nuances at instance level. View details
    Graph Agreement Models for Semi-supervised Learning
    Krishnamurthy Viswanathan
    Anthony Platanios
    Sujith Ravi
    Proceedings of the Thirty-third Conference on Neural Information Processing Systems, Neurips 2019
    Preview abstract Graph-based algorithms are among the most successful paradigms for solving semi-supervised learning tasks. Recent work on graph convolutional networks and neural graph learning methods has successfully combined the expressiveness of neural networks with graph structures. We propose a technique that, when applied to these methods, achieves state-of-the-art results on semi-supervised learning datasets. Traditional graph-based algorithms, such as label propagation, were designed with the underlying assumption that the label of a node can be imputed from that of the neighboring nodes. However, real-world graphs are either noisy or have edges that do not correspond to label agreement. To address this, we propose Graph Agreement Models (GAM), which introduces an auxiliary model that predicts the probability of two nodes sharing the same label as a learned function of their features. The agreement model is used when training a node classification model by encouraging agreement only for the pairs of nodes it deems likely to have the same label, thus guiding its parameters to better local optima. The classification and agreement models are trained jointly in a co-training fashion. Moreover, GAM can also be applied to any semi-supervised classification problem, by inducing a graph whenever one is not provided. We demonstrate that our method achieves a relative improvement of up to 72% for various node classification models, and obtains state-of-the-art results on multiple established datasets. View details
    Preview abstract In this paper we study the well-known family of Random Utility Models, developed over 50 years ago to codify rational user behavior in choosing one item from a finite set of options. In this setting each user draws i.i.d. from some distribution a utility function mapping each item in the universe to a real-valued utility. The user is then offered a subset of the items, and selects the one of maximum utility. A Max-Dist oracle for this choice model takes any subset of items and returns the probability (over the distribution of utility functions) that each will be selected. A discrete choice algorithm, given access to a Max-Dist oracle, must return a function that approximates the oracle. We show three primary results. First, we show that any algorithm exactly reproducing the oracle must make exponentially many queries. Second, we show an equivalent representation of the distribution over utility functions, based on permutations, and show that if this distribution has support size k, then it is possible to approximate the oracle using O(nk) queries. Finally, we consider settings in which the subset of items is always small. We give an algorithm that makes less than n^{(1–∊/2)K} queries, each to sets of size at most (1–∊/2)K, in order to approximate the Max-Dist oracle on every set of size |T| ≤ K with statistical error at most ∊. In contrast, we show that any algorithm that queries for subsets of size 2^{O(sqrt{log n})} must make maximal statistical error on some large sets. View details
    Learning a Mixture of Two Multinomial Logits
    Flavio Chierichetti
    Ravi Kumar
    ICML (2018)
    Preview abstract The classical Multinomial Logit (MNL) is a behavioral model for user choice. In this model, a user is offered a slate of choices (a subset of a finite universe of n items), and selects exactly one item from the slate, each with probability proportional to its (positive) weight. Given a set of observed slates and choices, the likelihood-maximizing item weights are easy to learn at scale, and easy to interpret. However, the model fails to represent common real-world behavior. As a result, researchers in user choice often turn to mixtures of MNLs, which are known to approximate a large class of models of rational user behavior. Unfortunately, the only known algorithms for this problem have been heuristic in nature. In this paper we give the first polynomial-time algorithms for exact learning of uniform mixtures of two MNLs. Interestingly, the parameters of the model can be learned for any n by sampling the behavior of random users only on slates of sizes 2 and 3; in contrast, we show that slates of size 2 are insufficient by themselves. View details
    Preview abstract We study the problem of automatically and efficiently generating itineraries for users who are on vacation. We focus on the common case, wherein the trip duration is more than a single day. Previous efficient algorithms based on greedy heuristics suffer from two problems. First, the itineraries are often unbalanced, with excellent days visiting top attractions followed by days of exclusively lower-quality alternatives. Second, the trips often re-visit neighborhoods repeatedly in order to cover increasingly low-tier points of interest. Our primary technical contribution is an algorithm that addresses both these problems by maximizing the quality of the worst day. We give theoretical results showing that this algorithm's competitive factor is within a factor two of the guarantee of the best available algorithm for a single day, across many variations of the problem. We also give detailed empirical evaluations using two distinct datasets: (a) anonymized Google historical visit data and (b) Foursquare public check-in data. We show first that the overall utility of our itineraries is almost identical to that of algorithms specifically designed to maximize total utility, while the utility of the worst day of our itineraries is roughly twice that obtained from other approaches. We then turn to evaluation based on human raters who score our itineraries only slightly below the itineraries created by human travel experts with deep knowledge of the area. View details
    Smart Reply: Automated Response Suggestion for Email
    Karol Kurach
    Sujith Ravi
    Tobias Kaufman
    Laszlo Lukacs
    Peter Young
    Vivek Ramavajjala
    Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) (2016).
    Preview abstract In this paper we propose and investigate a novel end-to-end method for automatically generating short email responses, called Smart Reply. It generates semantically diverse suggestions that can be used as complete email responses with just one tap on mobile. The system is currently used in Inbox by Gmail and is responsible for assisting with 10% of all mobile responses. It is designed to work at very high throughput and process hundreds of millions of messages daily. The system exploits state-of-the-art, large-scale deep learning. We describe the architecture of the system as well as the challenges that we faced while building it, like response diversity and scalability. We also introduce a new method for semantic clustering of user-generated content that requires only a modest amount of explicitly labeled data. View details
    Arrival and departure in Social Networks
    Shaomei Wu
    Atish Das Sarma
    Sixth ACM International Conference on Web Search and Data Mining, WSDM 2013
    Preview
    Online Selection of Diverse Results
    Debmalya Panigrahi
    Atish Das Sarma
    Proceedings of the 5th ACM international Conference on Web Search and Data Mining (2012), pp. 263-272
    Preview abstract The phenomenal growth in the volume of easily accessible information via various web-based services has made it essential for service providers to provide users with personalized representative summaries of such information. Further, online commercial services including social networking and micro-blogging websites, e-commerce portals, leisure and entertainment websites, etc. recommend interesting content to users that is simultaneously diverse on many different axes such as topic, geographic specificity, etc. The key algorithmic question in all these applications is the generation of a succinct, representative, and relevant summary from a large stream of data coming from a variety of sources. In this paper, we formally model this optimization problem, identify its key structural characteristics, and use these observations to design an extremely scalable and efficient algorithm. We analyze the algorithm using theoretical techniques to show that it always produces a nearly optimal solution. In addition, we perform large-scale experiments on both real-world and synthetically generated datasets, which confirm that our algorithm performs even better than its analytical guarantees in practice, and also outperforms other candidate algorithms for the problem by a wide margin. View details
    Max-Cover in Map-Reduce
    Flavio Chierichetti
    Ravi Kumar
    Proceedings of the 19th international conference on World Wide Web, ACM, Raleigh, North Carolina (2010), pp. 231-240
    Preview abstract The NP-hard Max-k-cover problem requires selecting k sets from a collection so as to maximize the size of the union. This classic problem occurs commonly in many settings in web search and advertising. For moderately-sized instances, a greedy algorithm gives an approximation of (1-1/e). However, the greedy algorithm requires updating scores of arbitrary elements after each step, and hence becomes intractable for large datasets. We give the first max cover algorithm designed for today's large-scale commodity clusters. Our algorithm has provably almost the same approximation as greedy, but runs much faster. Furthermore, it can be easily expressed in the MapReduce programming paradigm, and requires only polylogarithmically many passes over the data. Our experiments on five large problem instances show that our algorithm is practical and can achieve good speedups compared to the sequential greedy algorithm. View details
    Stochastic Models for Tabbed Browsing
    Flavio Chierichetti
    Ravi Kumar
    Proceedings of the 19th international conference on World Wide Web, ACM, Raleigh, North Carolina (2010), pp. 241-250
    Preview
    Dense Subgraph Extraction
    David Gibson
    Ravi Kumar
    Kevin S. McCurley
    in: Mining Graph Data, John Wiley & Sons (2006), pp. 411-441
    Preview
    Evolution of two-sided markets
    Ravi Kumar
    Yury Lifshits
    WSDM (2010), pp. 311-320
    A characterization of online browsing behavior
    Ravi Kumar
    WWW (2010), pp. 561-570
    Search is dead!: long live search
    Elizabeth F. Churchill
    Marti Hearst
    Barney Pell
    WWW (2010), pp. 1337-1338
    Stochastic models for tabbed browsing
    Flavio Chierichetti
    Ravi Kumar
    WWW (2010), pp. 241-250
    Max-cover in map-reduce
    Flavio Chierichetti
    Ravi Kumar
    WWW (2010), pp. 231-240
    A translation model for matching reviews to objects
    Nilesh N. Dalvi
    Ravi Kumar
    Bo Pang
    CIKM (2009), pp. 167-176
    ShatterPlots: Fast Tools for Mining Large Graphs
    Ana Paula Appel
    Deepayan Chakrabarti
    Christos Faloutsos
    Ravi Kumar
    Jure Leskovec
    SDM (2009), pp. 802-813
    An analysis framework for search sequences
    Qiaozhu Mei
    Kristina Lisa Klinkner
    Ravi Kumar
    CIKM (2009), pp. 1991-1994
    A Characterization of Online Search Behavior
    Ravi Kumar
    IEEE Data Eng. Bull., vol. 32 (2009), pp. 3-11
    For a few dollars less: Identifying review pages sans human labels
    Luciano Barbosa
    Ravi Kumar
    Bo Pang
    HLT-NAACL (2009), pp. 494-502
    A web of concepts
    Nilesh N. Dalvi
    Ravi Kumar
    Bo Pang
    Raghu Ramakrishnan
    Philip Bohannon
    Sathiya Keerthi
    Srujana Merugu
    PODS (2009), pp. 1-12
    Matching Reviews to Objects using a Language Model
    Nilesh N. Dalvi
    Ravi Kumar
    Bo Pang
    EMNLP (2009), pp. 609-618
    Efficient Discovery of Authoritative Resources
    Ravi Kumar
    Kevin Lang
    Cameron Marlow
    ICDE (2008), pp. 1495-1497
    Pig latin: a not-so-foreign language for data processing
    Christopher Olston
    Benjamin Reed
    Utkarsh Srivastava
    Ravi Kumar
    SIGMOD Conference (2008), pp. 1099-1110
    Vanity fair: privacy in querylog bundles
    Rosie Jones
    Ravi Kumar
    Bo Pang
    CIKM (2008), pp. 853-862
    Preferential behavior in online groups
    Lars Backstrom
    Ravi Kumar
    Cameron Marlow
    Jasmine Novak
    WSDM (2008), pp. 117-128
    Connectivity structure of bipartite graphs via the KNC-plot
    Ravi Kumar
    Erik Vee
    WSDM (2008), pp. 129-138
    Microscopic evolution of social networks
    Jure Leskovec
    Lars Backstrom
    Ravi Kumar
    KDD (2008), pp. 462-470
    Relaxation in text search using taxonomies
    Marcus Fontoura
    Vanja Josifovski
    Ravi Kumar
    Christopher Olston
    PVLDB, vol. 1 (2008), pp. 672-683
    Social networks: looking ahead
    Ravi Kumar
    Alexander Tuzhilin
    Christos Faloutsos
    David Jensen
    Gueorgi Kossinets
    Jure Leskovec
    KDD (2008), pp. 1060
    On anonymizing query logs via token-based hashing
    Ravi Kumar
    Jasmine Novak
    Bo Pang
    WWW (2007), pp. 629-638
    Visualizing tags over time
    Micah Dubinko
    Ravi Kumar
    Joseph Magnani
    Jasmine Novak
    TWEB, vol. 1 (2007)
    The discoverability of the web
    Anirban Dasgupta
    Arpita Ghosh
    Ravi Kumar
    Christopher Olston
    Sandeep Pandey
    WWW (2007), pp. 421-430
    "I know what you did last summer": query logs and user privacy
    Rosie Jones
    Ravi Kumar
    Bo Pang
    CIKM (2007), pp. 909-914
    Anchor-based proximity measures
    Amruta Joshi
    Ravi Kumar
    Benjamin Reed
    WWW (2007), pp. 1131-1132
    Evolutionary clustering
    Deepayan Chakrabarti
    Ravi Kumar
    KDD (2006), pp. 554-560
    Estimating corpus size via queries
    Marcus Fontoura
    Vanja Josifovski
    Ravi Kumar
    Rajeev Motwani
    Shubha U. Nabar
    Rina Panigrahy
    Ying Xu 0002
    CIKM (2006), pp. 594-603
    Core algorithms in the CLEVER system
    Ravi Kumar
    Sridhar Rajagopalan
    ACM Trans. Internet Techn., vol. 6 (2006), pp. 131-152
    Visualizing tags over time
    Micah Dubinko
    Ravi Kumar
    Joseph Magnani
    Jasmine Novak
    WWW (2006), pp. 193-202
    Hierarchical topic segmentation of websites
    Ravi Kumar
    Kunal Punera
    KDD (2006), pp. 257-266
    Structure and evolution of online social networks
    Ravi Kumar
    Jasmine Novak
    KDD (2006), pp. 611-617
    Content, Metadata, and Behavioral Information: Directions for Yahoo! Research
    Raghu Ramakrishnan
    Ravi Kumar
    IEEE Data Eng. Bull., vol. 29 (2006), pp. 10-18
    Navigating Low-Dimensional and Hierarchical Population Networks
    Ravi Kumar
    David Liben-Nowell
    ESA (2006), pp. 480-491
    Multi-structural databases
    Ronald Fagin
    Ravi Kumar
    Jasmine Novak
    D. Sivakumar
    PODS (2005), pp. 184-195
    Efficient Implementation of Large-Scale Multi-Structural Databases
    Ronald Fagin
    Phokion G. Kolaitis
    Ravi Kumar
    Jasmine Novak
    D. Sivakumar
    VLDB (2005), pp. 958-969
    Variable latent semantic indexing
    Anirban Dasgupta
    Ravi Kumar
    KDD (2005), pp. 13-21
    The predictive power of online chatter
    Daniel Gruhl
    Ravi Kumar
    Jasmine Novak
    KDD (2005), pp. 78-87
    Discovering Large Dense Subgraphs in Massive Graphs
    David Gibson
    Ravi Kumar
    VLDB (2005), pp. 721-732
    On the Bursty Evolution of Blogspace
    Ravi Kumar
    Jasmine Novak
    World Wide Web, vol. 8 (2005), pp. 159-178
    Anti-aliasing on the web
    Jasmine Novak
    WWW (2004), pp. 30-39
    Fast discovery of connection subgraphs
    Christos Faloutsos
    Kevin S. McCurley
    KDD (2004), pp. 118-127
    Sic transit gloria telae: towards an understanding of the web's decay
    Mining and Knowledge Discovery from the Web
    Kevin S. McCurley
    ISPAN (2004), pp. 4-11
    Minimizing Wirelength in Zero and Bounded Skew Clock Trees
    Moses Charikar
    Jon M. Kleinberg
    Ravi Kumar
    Sridhar Rajagopalan
    Amit Sahai
    SIAM J. Discrete Math., vol. 17 (2004), pp. 582-595
    Structure and evolution of blogspace
    Ravi Kumar
    Jasmine Novak
    Commun. ACM, vol. 47 (2004), pp. 35-39
    Propagation of trust and distrust
    On the bursty evolution of blogspace
    Ravi Kumar
    Jasmine Novak
    WWW (2003), pp. 568-576
    A case for automated large-scale semantic annotation
    Stephen Dill
    Nadav Eiron
    David Gibson
    Daniel Gruhl
    Anant Jhingran
    Tapas Kanungo
    Kevin S. McCurley
    Sridhar Rajagopalan
    John A. Tomlin
    Jason Y. Zien
    J. Web Sem., vol. 1 (2003), pp. 115-132
    SemTag and seeker: bootstrapping the semantic web via automated semantic annotation
    Stephen Dill
    Nadav Eiron
    David Gibson
    Daniel Gruhl
    Anant Jhingran
    Tapas Kanungo
    Sridhar Rajagopalan
    John A. Tomlin
    Jason Y. Zien
    WWW (2003), pp. 178-186
    Self-similarity in the web
    Stephen Dill
    Ravi Kumar
    Kevin S. McCurley
    Sridhar Rajagopalan
    D. Sivakumar
    ACM Trans. Internet Techn., vol. 2 (2002), pp. 205-223
    The Web and Social Networks
    Ravi Kumar
    Sridhar Rajagopalan
    IEEE Computer, vol. 35 (2002), pp. 32-36
    Self-similarity in the Web
    Stephen Dill
    Ravi Kumar
    Kevin S. McCurley
    Sridhar Rajagopalan
    D. Sivakumar
    VLDB (2001), pp. 69-78
    Recommendation Systems: A Probabilistic Analysis
    Ravi Kumar
    Sridhar Rajagopalan
    J. Comput. Syst. Sci., vol. 63 (2001), pp. 42-61
    On Semi-Automated Web Taxonomy Construction
    Ravi Kumar
    Sridhar Rajagopalan
    WebDB (2001), pp. 91-96
    Graph structure in the Web
    Ravi Kumar
    Farzin Maghoul
    Sridhar Rajagopalan
    Raymie Stata
    Janet L. Wiener
    Computer Networks, vol. 33 (2000), pp. 309-320
    The Web as a Graph
    Ravi Kumar
    Sridhar Rajagopalan
    D. Sivakumar
    Eli Upfal
    PODS (2000), pp. 1-10
    Random walks with ``back buttons'' (extended abstract)
    Ronald Fagin
    Anna R. Karlin
    Jon M. Kleinberg
    Sridhar Rajagopalan
    Ronitt Rubinfeld
    Madhu Sudan
    STOC (2000), pp. 484-493
    Random graph models for the web graph
    Ravi Kumar
    Sridhar Rajagopalan
    D. Sivakumar
    Eli Upfal
    FOCS (2000), pp. 57-65
    Mining the Web's Link Structure
    Soumen Chakrabarti
    Byron Dom
    Ravi Kumar
    Sridhar Rajagopalan
    David Gibson
    Jon M. Kleinberg
    IEEE Computer, vol. 32 (1999), pp. 60-67
    Minimizing Wirelength in Zero and Bounded Skew Clock Trees
    Moses Charikar
    Jon M. Kleinberg
    Ravi Kumar
    Sridhar Rajagopalan
    Amit Sahai
    SODA (1999), pp. 177-184
    The Web as a Graph: Measurements, Models, and Methods
    Jon M. Kleinberg
    Ravi Kumar
    Sridhar Rajagopalan
    COCOON (1999), pp. 1-17
    On targeting Markov segments
    Moses Charikar
    Ravi Kumar
    Sridhar Rajagopalan
    STOC (1999), pp. 99-108
    Extracting Large-Scale Knowledge Bases from the Web
    Ravi Kumar
    Sridhar Rajagopalan
    VLDB (1999), pp. 639-650
    Trawling the Web for Emerging Cyber-Communities
    Ravi Kumar
    Sridhar Rajagopalan
    Computer Networks, vol. 31 (1999), pp. 1481-1493
    Topic Distillation and Spectral Filtering
    Soumen Chakrabarti
    Byron Dom
    David Gibson
    Ravi Kumar
    Sridhar Rajagopalan
    Artif. Intell. Rev., vol. 13 (1999), pp. 409-435
    Recommendation Systems: A Probabilistic Analysis
    Ravi Kumar
    Sridhar Rajagopalan
    FOCS (1998), pp. 664-673
    A Trace-Driven Comparison of Algorithms for Parallel Prefetching and Caching
    Tracy Kimbrel
    R. Hugo Patterson
    Brian N. Bershad
    Pei Cao
    Edward W. Felten
    Garth A. Gibson
    Anna R. Karlin
    Kai Li
    OSDI (1996), pp. 19-34