Jump to Content
Umar Syed

Umar Syed

I received a Ph.D. in Computer Science from Princeton University, where I was advised by Rob Schapire. I spent two years as a postdoctoral researcher at the University of Pennsylvania, hosted by Ben Taskar and Michael Kearns. My research interests are in machine learning.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Pricing a low-regret seller
    Hoda Heidari
    Sadra Yazdanbod
    Proceedings of the Thirty-Third International Conference on Machine Learning (ICML 2016)
    Preview abstract As the number of ad exchanges has grown, publishers have turned to low regret learning algorithms to decide which exchange offers the best price for their inventory. This in turn opens the following question for the exchange: how to set prices to attract as many sellers as possible and maximize revenue. In this work we formulate this precisely as a learning problem, and present algorithms showing that by simply knowing that the counterparty is using a low regret algorithm is enough for the exchange to have its own low regret learning algorithm to find the optimal price. View details
    Preview abstract We introduce a novel, data-driven way for predicting battery consumption of apps. The state-of-the-art models used to blame battery consumption on apps are based on micro-benchmark experiments. These experiments are carried out on controlled setups where one can measure how much battery is consumed by each internal resource (CPU, bluetooth, WiFi...). The battery blame allocated to an app is simply the sum of the blames of the resources consumed by the app. We argue that this type of models do not capture the way phones work "in the wild" and propose instead to train a regression model using data collected from logs. We show that this type of learning is correct in the sense that under some assumptions, we can recover the true battery discharge rate of each component. We present experimental results where we consistently do better predictions than a model trained on micro-benchmarks. View details
    Where to sell: Simulating auctions from learning algorithms
    Hamid Nazerzadeh
    Proceedings of the Seventeenth ACM Conference on Economics and Computation (EC2016)
    Preview abstract Ad Exchange platforms connect online publishers and advertisers and facilitate selling billions of impressions every day. We study these environments from the perspective of a publisher who wants to find the profit maximizing exchange to sell his inventory. Ideally, the publisher would run an auction among exchanges. However, this is not possible due to technological and other practical considerations. The publisher needs to send each impression to one of the exchanges with an asking price. We model the problem as a variation of multi-armed bandits where exchanges (arms) can behave strategically in order to maximizes their own profit. We propose mechanisms that find the best exchange with sub-linear regret and have desirable incentive properties. View details
    Learning with Deep Cascades
    Proceedings of the Twenty-Sixth International Conference on Algorithmic Learning Theory (ALT 2015)
    Preview abstract We introduce a broad learning model formed by cascades of predictors, Deep Cascades, that is structured as general decision trees in which leaf predictors or node questions may be members of rich function families. We present new detailed data-dependent theoretical guarantees for learning with Deep Cascades with complex leaf predictors or node question in terms of the Rademacher complexities of the sub-families composing these sets of predictors and the fraction of sample points correctly classified at each leaf. These general guarantees can guide the design of a variety of different algorithms for deep cascade models and we give a detailed description of two such algorithms. Our second algorithm uses as node and leaf classifiers SVM predictors and we report the results of experiments comparing its performance with that of SVM combined with polynomial kernels. View details
    Structural maxent models
    Vitaly Kuznetsov
    Proceedings of the Thirty-Second International Conference on Machine Learning (ICML 2015)
    Preview abstract We present a new class of density estimation models, Structural Maxent models, with feature functions selected from a union of possibly very complex sub-families and yet benefiting from strong learning guarantees. The design of our models is based on a new principle supported by uniform convergence bounds and taking into consideration the complexity of the different sub-families composing the full set of features. We prove new data-dependent learning bounds for our models, expressed in terms of the Rademacher complexities of these sub-families. We also prove a duality theorem, which we use to derive our Structural Maxent algorithm. We give a full description of our algorithm, including the details of its derivation, and report the results of several experiments demonstrating that its performance improves on that of existing L1-norm regularized Maxent algorithms. We further similarly define conditional Structural Maxent models for multi-class classification problems. These are conditional probability models also making use of a union of possibly complex feature subfamilies. We prove a duality theorem for these models as well, which reveals their connection with existing binary and multi-class deep boosting algorithms. View details
    Preview abstract We give the first O(1/sqrt{T})-error online algorithm for reconstructing noisy statistical databases, where T is the number of (online) sample queries received. The algorithm is optimal up to the poly(log(T)) factor in terms of the error and requires only O(log T) memory. It aims to learn a hidden database-vector w* in R^d in order to accurately answer a stream of queries regarding the hidden database, which arrive in an online fashion from some unknown distribution D. We assume the distribution D is defined on the neighborhood of a low-dimensional manifold. The presented algorithm runs in O(dD)-time per query, where d is the dimensionality of the query-space. Contrary to the classical setting, there is no separate training set that is used by the algorithm to learn the database —- the stream on which the algorithm will be evaluated must also be used to learn the database-vector. The algorithm only has access to a binary oracle O that answers whether a particular linear function of the database-vector plus random noise is larger than a threshold, which is specified by the algorithm. We note that we allow for a significant O(D) amount of noise to be added while other works focused on the low noise o(sqrt{D}) setting. For a stream of T queries our algorithm achieves an average error O(1/sqrt{T}) by filtering out random noise, adapting threshold values given to the oracle based on its previous answers and, as a consequence, recovering with high precision a projection of a database-vector w* onto the manifold defining the query-space. Our algorithm may be also applied in the adversarial machine learning context to compromise machine learning engines by heavily exploiting the vulnerabilities of the systems that output only binary signal and in the presence of significant noise. View details
    Multi-Class Deep Boosting
    Vitaly Kuznetsov
    Advances in Neural Information Processing Systems (2014)
    Preview abstract We present new ensemble learning algorithms for multi-class classification. Our algorithms can use as a base classifier set a family of deep decision trees or other rich or complex families and yet benefit from strong generalization guarantees. We give new data-dependent learning bounds for convex ensembles in the multiclass classification setting expressed in terms of the Rademacher complexities of the sub-families composing the base classifier set, and the mixture weight assigned to each sub-family. These bounds are finer than existing ones both thanks to an improved dependency on the number of classes and, more crucially, by virtue of a more favorable complexity term expressed as an average of the Rademacher complexities based on the ensemble’s mixture weights. We introduce and discuss several new multi-class ensemble algorithms benefiting from these guarantees, prove positive results for the H-consistency of several of them, and report the results of experiments showing that their performance compares favorably with that of multi-class versions of AdaBoost and Logistic Regression and their L1-regularized counterparts. View details
    Preview abstract Motivated by real-time advertising exchanges, we analyze the problem of pricing inventory in a repeated posted-price auction. We consider both the cases of a truthful and surplus-maximizing buyer, where the former makes decisions myopically on every round, and the latter may strategically react to our algorithm, forgoing short-term surplus in order to trick the algorithm into setting better prices in the future. We further assume a buyer’s valuation of a good is a function of a context vector that describes the good being sold. We give the first algorithm attaining sublinear (O(T^{ 2/3})) regret in the contextual setting against a surplus-maximizing buyer. We also extend this result to repeated second-price auctions with multiple buyers. View details
    Deep boosting
    Proceedings of the Thirty-First International Conference on Machine Learning (ICML 2014)
    Preview abstract We present a new ensemble learning algorithm, DeepBoost, which can use as base classifiers a hypothesis set containing deep decision trees, or members of other rich or complex families, and succeed in achieving high accuracy without overfitting the data. The key to the success of the algorithm is a capacity-conscious criterion for the selection of the hypotheses. We give new data- dependent learning bounds for convex ensembles expressed in terms of the Rademacher complexities of the sub-families composing the base classifier set, and the mixture weight assigned to each sub-family. Our algorithm directly benefits from these guarantees since it seeks to minimize the corresponding learning bound. We give a full description of our algorithm, including the details of its derivation, and report the results of several experiments showing that its performance compares favorably to that of AdaBoost and Logistic Regression and their L1-regularized variants. View details
    Preview abstract Google Research recently tested a massive online class model for an internal engineering education program, with machine learning as the topic, that blended theoretical concepts and Google-specific software tool tutorials. The goal of this training was to foster engineering capacity to leverage machine learning tools in future products. The course was delivered both synchronously and asynchronously, and students had the choice between studying independently or participating with a group. Since all students are company employees, unlike most publicly offered MOOCs we can continue to measure the students’ behavioral change long after the course is complete. This paper describes the course, outlines the available data set and presents directions for analysis. View details
    Preview abstract Inspired by real-time ad exchanges for online display advertising, we consider the problem of inferring a buyer’s value for a good when the buyer is repeatedly interacting with the seller through a posted-price mechanism. We model the buyer as a strategic agent, interested in maximizing her long-term surplus, and are interested in optimizing seller revenue. We show conditions under which the seller cannot hope to gain an advantage by learning the buyer’s value – i.e. the buyer can always manipulate the exchange to hide her value. This result is accompanied by a seller algorithm that is able to achieve no-regret when the buyer is unable to incur the short-term costs of such manipulation. View details
    An $\tildeO(\frac1\sqrtT)$-error online algorithm for retrieving heavily perturbated statistical databases in the low-dimensional querying mode
    There's something about MRAI: Timing diversity can exponentially worsen BGP convergence
    Jennifer Rexford
    Proceedings of the Thirtieth IEEE International Conference on Computer Communications (INFOCOM 2011)
    Graphical models for bandit problems
    Kareem Amin
    Michael Kearns
    Uncertainty in Artificial Intelligence: Proceedings of the Twenty-Seventh Conference (UAI 2011)
    Bandits, query learning, and the haystack dimension
    Kareem Amin
    Michael Kearns
    Proceedings of the Twenty-Fourth Annual Conference on Learning Theory (COLT 2011)
    Semi-supervised learning with adversarially missing label information
    Ben Taskar
    Advances in Neural Information Processing Systems 24 (NIPS 2010)
    A reduction from apprenticeship learning to classification
    Robert E. Schapire
    Advances in Neural Information Processing Systems 24 (NIPS 2010)
    Private and third-party randomization in risk-sensitive equilibrium concepts
    Mickey Brautbar
    Michael Kearns
    Proceedings of the Twenty-Fourth Conference on Artificial Intelligence (AAAI 2010)
    Adapting to the shifting intent of search queries
    Aleksandrs Slivkins
    Nina Mishra
    Advances in Neural Information Processing Systems 23 (NIPS 2009)
    Using automatically transcribed dialogs to learn user models in a spoken dialog system
    Jason Williams
    Proceedings of the Forty-Sixth Annual Meeting of the Association for Computational Linguistics (ACL 2008)
    Enzyme function prediction with interpretable models
    Golan Yona
    Methods in Molecular Biology: Computational Systems Biology, Humana Press (2008)
    Apprenticeship learning using linear programming
    Michael Bowling
    Robert E. Schapire
    Proceedings of the Twenty-Fifth International Conference on Machine Learning (ICML 2008)
    A game-theoretic approach to apprenticeship learning
    Robert E. Schapire
    Advances in Neural Information Processing Systems 21 (NIPS 2007)
    Imitation learning with a value-based prior
    Robert E. Schapire
    Uncertainty in Artificial Intelligence: Proceedings of the Twenty-Third Conference (UAI 2007)
    Using a mixture of probabilistic decisions trees for direct prediction of protein function
    Golan Yona
    Proceedings of the Seventh Annual International Conference on Research in Computational Biology (RECOMB 2003)