# Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By

1 - 15 of 323 publications

Robust Repeated Auctions Under Heterogeneous Buyer Behavior

Shipra Agrawal

Constantinos Daskalakis

Proceedings of the Nineteenth ACM Conference on Economics and Computation, EC '18 (2018)

Preview abstract
We study revenue optimization in a repeated auction between a single seller and a single buyer. Traditionally, the design of repeated auctions requires strong modeling assumptions about the bidder behavior, such as it being myopic, infinite lookahead, or some specific form of learning behavior. Is it possible to design mechanisms which are simultaneously optimal against a multitude of possible buyer behaviors? We answer this question by designing a simple state-based mechanism that is simultaneously approximately optimal against a k-lookahead buyer for all k, a buyer who is a no-regret learner, and a buyer who is a policy-regret learner. Against each type of buyer our mechanism attains a constant fraction of the optimal revenue attainable against that type of buyer. We complement our positive results with almost tight impossibility results, showing that the revenue approximation tradeoffs achieved by our mechanism for different lookahead attitudes are near-optimal.
View details

Truthful Multi-Parameter Auctions with Online Supply: An Impossible Combination

Nikhil R. Devanur

Vasilis Syrgkanis

Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2018

Preview abstract
We study a basic auction design problem with online supply. There are two unit-demand bidders and two types of items. The first item type will arrive first for sure, and the second item type may or may not arrive. The auctioneer has to decide the allocation of an item immediately after each item arrives, but is allowed to compute payments after knowing how many items arrived. For this problem we show that there is no deterministic truthful and individually rational mechanism that, even with unbounded computational resources, gets any finite approximation factor to the optimal social welfare.
View details

PPP-Net: Platform-aware Progressive Search for Pareto-optimal Neural Architectures

Jin-Dong Dong

An-Chieh Cheng

Wei Wei

Min Sun

International Conference on Learning Representations (ICLR) Workshop (2018)

Preview abstract
Recent breakthroughs in Neural Architectural Search (NAS) have achieved state-of-the-art performances in many applications such as image recognition. However, these techniques typically ignore platform-related constrictions (e.g., inference time and power consumptions) that can be critical for portable devices with limited computing resources. We propose PPP-Net: a multi-objective architectural search framework to automatically generate networks that achieve Pareto Optimality. PPP-Net employs a compact search space inspired by operations used in state-of-the-art mobile CNNs. PPP-Net has also adopted the progressive search strategy used in a recent literature (Liu et al. (2017a)). Experimental results demonstrate that PPP-Net achieves better performances in both (a) higher accuracy and (b) shorter inference time, comparing to the state-of-the-art CondenseNet.
View details

Hidden in Plain Sight: Classifying Emails Using Embedded Image Contents

Proceedings of the 2018 World Wide Web Conference (WWW 2018), pp. 1865-1874

Preview abstract
A vast majority of the emails received by people today are machine-generated by businesses communicating with consumers. While some emails originate as a result of a transaction (e.g., hotel or restaurant reservation confirmations, online purchase receipts, shipping notifications, etc.), a large fraction are commercial emails promoting an offer (a special sale, free shipping, available for a limited time, etc.). The sheer number of these promotional emails makes it difficult for users to read all these emails and decide which ones are actually interesting and actionable. In this paper, we tackle the problem of extracting information from commercial emails promoting an offer to the user. This information enables an email platform to build several new experiences that can unlock the value in these emails without the user having to navigate and read all of them. For instance, we can highlight offers that are expiring soon, or display a notification when there's an unexpired offer from a merchant if your phone recognizes that you are at that merchant's store.
A key challenge in extracting information from such commercial emails is that they are often image-rich and contain very little text. Training a machine learning (ML) model on a rendered image-rich email and applying it to each incoming email can be prohibitively expensive. In this paper, we describe a cost-effective approach for extracting signals from both the text and image content of commercial emails in the context of a free email platform that serves over a billion users around the world. The key insight is to leverage the template structure of emails, and use off-the-shelf OCR techniques to obtain the text from images to augment the existing text features offline. Compared to a text-only approach, we show that we are able to identify 9.12% more email templates corresponding to ~5% more emails being identified as offers. Interestingly, our analysis shows that this 5% improvement in coverage is across the board, irrespective of whether the emails were sent by large merchants or small local merchants, allowing us to deliver an improved experience for everyone.
View details

Recommendations for all : solving thousands of recommendation problems a day

Proceedings of the 34th IEEE International Conference on Data Engineering (ICDE) (2018) (to appear)

Preview abstract
Recommendations are known to be an important part of several online experiences. Outside of media recommendation (music, movies, etc), online retailers have made use of product recommendations to help users make purchases. Product recommendation tends to be really hard because of the twin problems of sparsity and cold-start. Building a recommendation system that performs well in this setting is hard and is generally considered to need some expert tuning. However, all online retailers need to solve this problem well to provide good recommendations.
In this paper, we tackle this problem and describe an industrial-scale system called Sigmund where we solve tens of thousands of instances of the recommendation problem as a service for various online retailers. for customers. Sigmund was deployed to production in early 2014 and has been serving thousands of retailers. We describe several design decisions that we made in building Sigmund. We also share some of the lessons we learned from this experience –both from a machine learning perspective and a systems perspective. We hope that these lessons are useful for building future machine-learning services.
View details

Learning with Sparse and Biased Feedback for Personal Search

Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI) (2018), pp. 5219-5223

Preview abstract
Personal search, including email, on-device, and personal media search, has recently attracted a considerable attention from the information retrieval community. In this paper, we provide an overview of challenges and opportunities of learning with implicit user feedback (e.g., click data) in personal search. Implicit user feedback provides a convenient source of supervision for ranking models in personal search. This feedback, however, has two major drawbacks: it is highly sparse and biased due to the personal nature of queries and documents. We demonstrate how these drawbacks can be overcome, and empirically demonstrate the benefits of learning with implicit feedback in the context of a large-scale email search engine.
View details

Fast Algorithms for Knapsack via Convolution and Prediction

MohammadTaghi Hajiaghayi

Saeed Seddighin

Proceedings of the 50th Annual ACM Symposium on the Theory of Computing (STOC) (2018), pp. 1269-1282

Preview abstract
The knapsack problem is a fundamental problem in combinatorial optimization. It has been studied extensively from theoretical as well as practical perspectives as it is one of the most well-known NP-hard problems. The goal is to pack a knapsack of size t with the maximum value from a collection of n items with given sizes and values.
Recent evidence suggests that a classic O(nt) dynamic-programming solution for the knapsack problem might be the fastest in the worst case. In fact, solving the knapsack problem was shown to be equivalent to the (min,+) convolution problem (Cygan et al., ICALP 2017), which is thought to be facing a quadratic-time barrier. This hardness is in contrast to the more famous (+,·) convolution (generally known as polynomial multiplication), that has an O(nlogn)-time solution via Fast Fourier Transform.
Our main results are algorithms with near-linear running times for the knapsack problem, if either the values or sizes of items are small integers. More specifically, if item sizes are integers bounded by s_max, the running time of our algorithm is O~((n + t)s_max). If the item values are integers bounded by v_max, our algorithm runs in time O~(n + t v_max). Best previously known running times were O(nt), O(n^2 s_max) and O(n s_max v_max) (Pisinger, J. of Alg., 1999).
At the core of our algorithms lies the prediction technique: Roughly speaking, this new technique enables us to compute the convolution of two vectors in time O (n e_max) when an approximation of the solution within an additive error of e_max is available.
Our results also have implications regarding algorithms for several other problems including tree sparsity, tree separability and the unbounded knapsack problem, in the case when some of the relevant numerical input values are bounded.
View details

The Geometry of Random Features

Mark Rowland

Richard Turner

Adrian Weller

International Conference on Artificial Intelligence and Statistics (AISTATS) (2018)

Preview abstract
We present an in-depth examination of the effectiveness of radial basis function kernel (beyond
Gaussian) estimators based on orthogonal random feature maps. We show that orthogonal estimators
outperform state-of-the-art mechanisms that use iid sampling under weak conditions for tails of the
associated Fourier distributions. We prove that for the case of many dimensions, the superiority of the orthogonal transform over iid methods can be accurately measured by a property we define called the charm of the kernel, and that orthogonal random features provide optimal kernel estimators.
Furthermore, we provide the first theoretical results which explain why orthogonal random features outperform unstructured on downstream tasks such as kernel ridge regression by showing that orthogonal random features provide kernel algorithms with better spectral properties than the previous state-of-the-art. Our results enable practitioners more generally to estimate the benefits from applying orthogonal transforms.
View details

Incentive-Aware Learning for Large Markets

Proceedings of the 2018 World Wide Web Conference on World Wide Web, WWW 2018, Lyon, France, April 23-27, 2018, pp. 1369-1378

Preview abstract
In a typical learning problem, one key step is to use training data to pick one model from a collection of models that optimizes an objective function. In many multi-agent settings, the training data is generated through the actions of the agents, and the model is
used to make a decision (e.g., how to sell an item) that affects the agents. An illustrative example of this is the problem of learning the
reserve price in an auction. In such cases, the agents have an incentive to influence the training data (e.g., by manipulating their bids in
the case of an auction) to game the system and achieve a more favorable outcome. In this paper, we study such incentive-aware learning
problem in a general setting and show that it is possible to approximately optimize the objective function under two assumptions:
(i) each individual agent is a “small” (part of the market); and (ii) there is a cost associated with manipulation. For our illustrative
application, this nicely translates to a mechanism for setting approximately optimal reserve prices in auctions where no individual
agent has significant market share. For this application, we also show that the second assumption (that manipulations are costly) is
not necessary since we can “perturb” any auction to make it costly for the agents to manipulate.
View details

Neural Graph Learning: Training Neural Networks Using Graphs

Thang D. Bui

Sujith Ravi

Vivek Ramavajjala

Proceedings of 11th ACM International Conference on Web Search and Data Mining (WSDM) (2018)

Preview abstract
Label propagation is a powerful and flexible semi-supervised learning technique on graphs. Neural networks, on the other hand, have proven track records in many supervised learning tasks. In this work, we propose a training framework with a graph-regularized objective, namely Neural Graph Machines, that can combine the power of neural networks and label propagation. This work generalizes previous literature on graph-augmented training of neural networks, enabling it to be applied to multiple neural architectures (Feed-forward NNs, CNNs and LSTM RNNs) and a wide range of graphs. The new objective allows the neural networks to harness both labeled and unlabeled data by: (a) allowing the network to train using labeled data as in the supervised setting, (b) biasing the network to learn similar hidden representations for neighboring nodes on a graph, in the same vein as label propagation. Such architectures with the proposed objective can be trained efficiently using stochastic gradient descent and scaled to large graphs; its runtime is linear in the number of edges. The proposed joint training approach convincingly outperforms many existing methods on a wide range of tasks (multi-label classification on social graphs, news categorization, document classification and semantic intent classification), with multiple forms of graph inputs (including graphs with and without node-level features) and using different types of neural networks.
View details

Round Compression for Parallel Matching Algorithms

Aleksander Mądry

Artur Czumaj

Krzysztof Onak

Piotr Sankowski

Slobodan Mitrović

STOC 2018 (to appear)

Preview abstract
For over a decade now we have been witnessing the success of massive parallel computation (MPC) frameworks, such as MapReduce, Hadoop, Dryad, or Spark. One of the reasons for their success is the fact that these frameworks are able to accurately capture the nature of large-scale computation. In particular, compared to the classic distributed algorithms or PRAM models, these frameworks allow for much more local computation. The fundamental question that arises in this context is though: can we leverage this additional power to obtain even faster parallel algorithms?
A prominent example here is the maximum matching problem---one of the most classic graph problems.
It is well known that in the PRAM model one can compute a 2-approximate maximum matching in O(log n) rounds. However, the exact complexity of this problem in the MPC framework is still far from understood. Lattanzi et al. (SPAA 2011) showed that if each machine has n^(1+Omega(1)) memory, this problem can also be solved 2-approximately in a constant number of rounds. These techniques, as well as the approaches developed in the follow up work, seem though to get stuck in a fundamental way at roughly O(log n) rounds once we enter the (at most) near-linear memory regime. It is thus entirely possible that in this regime, which captures in particular the case of sparse graph computations, the best MPC round complexity matches what one can already get in the PRAM model, without the need to take advantage of the extra local computation power.
In this paper, we finally refute that perplexing possibility. That is, we break the above O(log n) round complexity bound even in the case of slightly sublinear memory per machine. In fact, our improvement here is {\em almost exponential}: we are able to deliver a (2+epsilon)-approximation to maximum matching, for any fixed constant epsilon>0, in O((log log n)^2) rounds.
To establish our result we need to deviate from the previous work in two important ways that are crucial for exploiting the power of the MPC model, as compared to the PRAM model. Firstly, we use vertex--based graph partitioning, instead of the edge--based approaches that were utilized so far. Secondly, we develop a technique of round compression. This technique enables one to take a (distributed) algorithm that computes an O(1)-approximation of maximum matching in O(log n) independent PRAM phases and implement a super-constant number of these phases in only a constant number of MPC rounds.
View details

Optimal Distributed Submodular Optimization via Sketching

Hossein Esfandiari

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2018), pp. 1138-1147

Preview abstract
As an important special case of submodular optimization problems, coverage problems are a central problem in optimization with a wide range of applications in data mining and machine learning. As
we need to handle larger and larger data sets, there is a clear need to develop distributed solutions to
these problems. While several results have been developed for distributed coverage maximizations, all
the existing method have notable limitations, e.g., they all achieve either suboptimal approximation
guarantees or suboptimal space and memory complexities. Moreover, most previous results for submodular maximization either explicitly or implicitly assume that one has a value oracle access to
the submodular function. Such a value oracle for coverage functions has the following form: given a subfamily of (input) subsets, determine the size of the union of the subsets in this subfamily.
View details