Jump to Content

Elad Hazan

For publications, courses, and other material, please see: https://www.ehazan.com
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Boosting Simple Learners
    Noga Alon
    Alon Gonen
    Shay Moran
    TheoretiCS (2023)
    Preview abstract We study boosting algorithms under the assumption that the given weak learner outputs hypotheses from a class of bounded capacity. This assumption is inspired by the common convention that weak hypotheses are “rules-of-thumbs” from an “easy-to-learn class”. Formally, we assume the class of weak hypotheses has a bounded VC dimension. We focus on two main questions: (i) Oracle Complexity: we show that this restriction allows to circumvent a lower bound due to Freund and Schapire (‘95, ‘12) on the number of calls to the weak learner. Unlike previous boosting algorithms which aggregate the weak hypotheses by majority votes, in order to circumvent the lower bound we propose a boosting procedure that uses more complex aggregation rules, and we show this to be necessary. (ii) Expressivity: Which tasks can be learned by boosting a base-class of bounded VC-dimension? Can concepts that are “far away” from the class be learned? Towards answering this question we identify a combinatorial-geometric parameter which quantifies the expressivity of a base-class when used as part of a boosting procedure. Along the way, we establish and exploit connections with Discrepancy Theory. View details
    Multiclass Boosting and the Cost of Weak Learning
    Indraneel Mukherjee
    Nataly Brukhim
    Robert Schapire
    Shay Moran
    Advances in Neural Information Processing Systems 34 (NeurIPS 2021) (2021)
    Preview abstract Boosting is an algorithmic approach which is based on the idea of combining weak and moderately inaccurate hypotheses to a strong and accurate one. In this work we study multiclass boosting with a possibly large number of classes or categories. Multiclass boosting can be formulated in various ways. Here, we focus on an especially natural formulation in which the weak hypotheses are assumed to belong to an ``easy-to-learn'' base class, and the weak learner is an agnostic PAC learner for that class with respect to the standard classification loss. This is in contrast with other, more complicated losses as have often been considered in the past. The goal of the overall boosting algorithm is then to learn a combination of weak hypotheses by repeatedly calling the weak learner. We study the resources required for boosting, especially how they depend on $k$, the number of classes. For the booster, these include the number of examples needed for learning and the number of oracle calls to the weak learner. For the weak learner, we measure the required accuracy, that is, how close the returned hypothesis must be to the best in the base class, or alternatively, the number of unweighted samples the boosting algorithm feeds the weak learner in each round. We find that the boosting algorithm itself only requires $O(\log k)$ samples, as we show by analyzing a variant of AdaBoost for our setting. In stark contrast, assuming typical limits on the number of weak-learner calls, we prove that the number of samples required by any weak learner is at least polynomial in $k$, exponentially more than the number of samples needed by the booster. Alternatively, we prove that the weak learner's accuracy parameter must be smaller than an inverse polynomial in $k$, showing that the returned weak hypotheses must be nearly the best in their class when $k$ is large. We also prove a trade-off between number of oracle calls and the resources required of the weak learner, meaning that the fewer calls to the weak learner the more that is demanded on each call. View details
    Preview abstract We study the question of how to aggregate controllers for dynamical systems in order to improve their performance. To this end, we propose a framework of boosting for online control. Our main result is an efficient boosting algorithm that combines weak controllers into a provably more accurate one. Empirical evaluation on a host of control settings supports our theoretical findings. View details
    Deluca -- A Differentiable Control Library: Environments, Methods, and Benchmarking
    Alex Jiawen Yu
    Anirudha Majumdar
    Cyril Zhang
    John Olof Hallman
    Karan Singh
    Paula Nicoleta Gradu
    Udaya Ghai
    Differentiable computer vision, graphics, and physics in machine learning workshop at Neurips 2020 (2020) (to appear)
    Preview abstract We present an open-source library of natively differentiable physics and robotics environments, accompanied by gradient-based control methods and a benchmarking suite. The introduced environments allow auto-differentiation through the simulation dynamics, and thereby permit fast training of controllers. The library features several popular environments, including classical control settings from OpenAI Gym \cite{brockman2016openai}. We also provide a novel differentiable environment, based on deep neural networks, that simulates medical ventilation. We give several use-cases of new scientific results obtained using the library. This includes a medical ventilator simulator and controller, an adaptive control method for time-varying linear dynamical systems, and new gradient-based methods for control of linear dynamical systems with adversarial perturbations. View details
    Provably Efficient Maximum Entropy Exploration
    Sham Kakade
    Karan Singh
    Abby Van Soest
    ICML (2019)
    Preview abstract Suppose an agent is in a (possibly unknown) Markov Decision Process in the absence of a reward signal, what might we hope that an agent can efficiently learn to do? One natural, intrinsically defined, objective problem is for the agent to learn a policy which induces a distribution over state space that is as uniform as possible, which can be measured in an entropic sense. We provide an efficient algorithm to construct such a maximum-entropy exploratory policy, when given access to a black box planning oracle (which is robust to function approximation). Furthermore, when restricted to the tabular setting where we have sample based access to the MDP, our proposed algorithm is provably efficient method, both in terms of sample size and computational complexity. Key to our algorithmic methodology is utilizing the conditional gradient method (a.k.a. the Frank-Wolfe algorithm) which utilizes an approximate MDP solver. View details
    Preview abstract We consider online learning in an adversarial, non-convex setting under the assumption that the learner has an access to an offline optimization oracle. In the general setting of prediction with expert advice, [11] established that in the optimization-oracle model, online learning requires exponentially more computation than statistical learning. In this paper we show that by slightly strengthening the oracle model, the online and the statistical learning models become computationally equivalent. Our result holds for any Lipschitz and bounded (but not necessarily convex) function. As an application we demonstrate how the offline oracle enables efficient computation of an equilibrium in non-convex games, that include GAN (generative adversarial networks) as a special case. View details
    Preview abstract We study the optimal regret bounds for control in linear dynamical systems with adversarially changing strongly convex cost functions. This framework includes several well studied and influential algorithms such as the Kalman filter and the linear quadratic regulator. State of the art methods achieve regret which scales as $O(\sqrt{T})$, where $T$ is the time horizon, or number of iterations. We show that the optimal regret in this fundamental setting can be significantly smaller, and scales as $O(\poly(\log T))$, closing the gap in the literature between known upper and lower bounds. This regret bound is achieved by an efficient iterative gradient-based method. View details
    Efficient Full-Matrix Adaptive Regularization
    Naman Agarwal
    Brian Anderson Bullins
    Karan Singh
    Cyril Zhang
    Yi Zhang
    ICML (2019)
    Preview abstract Adaptive regularization methods pre-multiply a descent direction by a preconditioning matrix. Due to the large number of parameters of machine learning problems, full-matrix preconditioning methods are prohibitively expensive. We show how to modify full-matrix adaptive regularization in order to make it practical and effective. We also provide novel theoretical analysis for adaptive regularization in non-convex optimization settings. The core of our algorithm, termed GGT, consists of efficient inverse computation of square roots of low-rank matrices. Our preliminary experiments underscore improved convergence rate of GGT across a variety of synthetic tasks and standard deep learning benchmarks. View details
    Online Control with Adversarial Disturbances
    Brian Anderson Bullins
    Sham Kakade
    Karan Singh
    ICML (2019)
    Preview abstract We study the control of a linear dynamical system with adversarial disturbances (as opposed to statistical noise). The objective we consider is one of regret: we desire an online control procedure that can do nearly as well as that of a procedure that has full knowledge of the disturbances in hindsight. Our main result is an efficient algorithm that provides nearly tight regret bounds for this problem. From a technical standpoint, this work generalizes upon previous work in that our model allows for adversarial noise in the dynamics and allows for general convex costs. View details
    Private Learning Implies Online Learning: An Efficient Reduction
    Alon Gonen
    Shay Moran
    NeurIPS Spotlight (2019) (to appear)
    Preview abstract We study the relationship between the notions of differentially private learning and online learning in games. Several recent works have shown that differentially private learning implies online learning, but an open problem of Neel, Roth, and Wu \cite{NeelAaronRoth2018} asks whether this implication is efficient. Specifically, does an efficient differentially private learner imply an efficient online learner? In this paper we resolve this open question in the context of pure differential privacy. We derive an efficient black-box reduction from differentially private learning to online learning from expert advice. View details
    Online Learning of Quantum States
    Scott Aaronson
    Xinyi Chen
    Ashwin Nayak
    NeurIPS (2018)
    Preview abstract Suppose we have many copies of an unknown n-qubit state ρ. We measure some copies of ρ using a known two-outcome measurement E1, then other copies using a measurement E2, and so on. At each stage t, we generate a current hypothesis σt about the state ρ, using the outcomes of the previous measurements. We show that it is possible to do this in a way that guarantees that |Tr(Eiσt)−Tr(Eiρ)|, the error in our prediction for the next measurement, is at least ε at most O(n/ε2) times. Even in the "non-realizable" setting---where there could be arbitrary noise in the measurement outcomes---we show how to output hypothesis states that do significantly worse than the best possible states at most O(Tn‾‾‾√) times on the first T measurements. These results generalize a 2007 theorem by Aaronson on the PAC-learnability of quantum states, to the online and regret-minimization settings. We give three different ways to prove our results---using convex optimization, quantum postselection, and sequential fat-shattering dimension---which have different advantages in terms of parameters and portability. View details
    Preview abstract We study the following question: given an efficient approximation algorithm for an optimization problem, can we learn efficiently in the same setting? We give a formal affirmative answer to this question in the form of a reduction from online learning to offline approximate optimization using an efficient algorithm that guarantees near optimal regret. The algorithm is efficient in terms of the number of oracle calls to a given approximation oracle – it makes only logarithmically many such calls per iteration. This resolves an open question by Kalai and Vempala, and by Garber. Furthermore, our result applies to the more general improper learning problems. View details
    Preview abstract Conventional wisdom in deep learning states that increasing depth improves expressiveness but complicates optimization. This paper suggests that, sometimes, increasing depth can speed up optimization. The effect of depth on optimization is decoupled from expressiveness by focusing on settings where additional layers amount to overparameterization -- linear neural networks, a well-studied model. Theoretical analysis, as well as experiments, show that here depth acts as a preconditioner which may accelerate convergence. Even on simple convex problems such as linear regression with ℓp loss, p>2, gradient descent can benefit from transitioning to a non-convex overparameterized objective, more than it would from some common acceleration schemes. We also prove that it is mathematically impossible to obtain the acceleration effect of overparametrization via gradients of any regularizer. View details
    Spectral Filtering for General Linear Dynamical Systems
    Holden Lee
    Karan Singh
    Cyril Zhang
    Yi Zhang
    NeurIPS (2018)
    Preview abstract We give a polynomial-time algorithm for learning latent-state linear dynamical systems without system identification, and without assumptions on the spectral radius of the system's transition matrix. The algorithm extends the recently introduced technique of spectral filtering, previously applied only to systems with a symmetric transition matrix, using a novel convex relaxation to allow for the efficient identification of phases. View details
    No Results Found