Elad Hazan
For publications, courses, and other material, please see:
https://www.ehazan.com
Research Areas
Authored Publications
Sort By
Preview abstract
We study boosting algorithms under the assumption that the given weak learner outputs hypotheses from a class of bounded capacity. This assumption is inspired by the common convention that weak hypotheses are “rules-of-thumbs” from an “easy-to-learn class”. Formally, we assume the class of weak hypotheses has a bounded VC dimension. We focus on two main questions:
(i) Oracle Complexity: we show that this restriction allows to circumvent a lower bound due to Freund
and Schapire (‘95, ‘12) on the number of calls to the weak learner. Unlike previous boosting algorithms
which aggregate the weak hypotheses by majority votes, in order to circumvent the lower bound we propose a boosting procedure that uses more complex aggregation rules, and we show this to be necessary.
(ii) Expressivity: Which tasks can be learned by boosting a base-class of bounded VC-dimension? Can
concepts that are “far away” from the class be learned? Towards answering this question we identify a
combinatorial-geometric parameter which quantifies the expressivity of a base-class when used as part of a boosting procedure. Along the way, we establish and exploit connections with Discrepancy Theory.
View details
Multiclass Boosting and the Cost of Weak Learning
Indraneel Mukherjee
Nataly Brukhim
Robert Schapire
Shay Moran
Advances in Neural Information Processing Systems 34 (NeurIPS 2021) (2021)
Preview abstract
Boosting is an algorithmic approach which is based on the idea
of combining weak and moderately inaccurate hypotheses to a strong and accurate one.
In this work we study multiclass boosting with a possibly large number of classes or categories.
Multiclass boosting can be formulated in various ways.
Here, we focus on an especially natural formulation in which the weak hypotheses
are assumed to belong to an ``easy-to-learn'' base class, and
the weak learner is an agnostic PAC learner for that class
with respect to the standard classification loss.
This is in contrast with other, more complicated losses as have often been considered in the past.
The goal of the overall boosting algorithm
is then to learn a combination of weak hypotheses
by repeatedly calling the weak learner.
We study the resources required for boosting, especially how they
depend on $k$, the number of classes.
For the booster, these include the number of examples needed for learning
and the number of oracle calls to the weak learner.
For the weak learner, we measure the required accuracy, that is, how
close the returned hypothesis must be to the best in the base class,
or alternatively, the number of unweighted
samples the boosting algorithm feeds the weak learner in each round.
We find that the boosting algorithm itself only requires $O(\log k)$
samples, as we show by analyzing a variant of AdaBoost for our
setting. In stark contrast, assuming typical limits on the number of weak-learner calls,
we prove that the number of samples required by any
weak learner is at least polynomial in $k$, exponentially more than the
number of samples needed by the booster.
Alternatively, we prove that the weak learner's accuracy parameter
must be smaller
than an inverse polynomial in $k$, showing that the returned weak
hypotheses must be nearly the best in their class when $k$ is large.
We also prove a trade-off between number of oracle calls and the
resources required of the weak learner, meaning that the fewer calls to the
weak learner the more that is demanded on each call.
View details
Preview abstract
We study the question of how to aggregate controllers for dynamical systems in order to improve their performance. To this end, we propose a framework of boosting for online control. Our main result is an efficient boosting algorithm that combines weak controllers into a provably more accurate one. Empirical evaluation on a host of control settings supports our theoretical findings.
View details
Deluca -- A Differentiable Control Library: Environments, Methods, and Benchmarking
Alex Jiawen Yu
Anirudha Majumdar
Cyril Zhang
John Olof Hallman
Karan Singh
Paula Nicoleta Gradu
Udaya Ghai
Differentiable computer vision, graphics, and physics in machine learning workshop at Neurips 2020 (2020) (to appear)
Preview abstract
We present an open-source library of natively differentiable physics and robotics environments, accompanied by gradient-based control methods and a benchmarking suite.
The introduced environments allow auto-differentiation through the simulation dynamics, and thereby permit fast training of controllers. The library features several popular environments, including classical control settings from OpenAI Gym \cite{brockman2016openai}. We also provide a novel differentiable environment, based on deep neural networks, that simulates medical ventilation.
We give several use-cases of new scientific results obtained using the library. This includes a medical ventilator simulator and controller, an adaptive control method for time-varying linear dynamical systems, and new gradient-based methods for control of linear dynamical systems with adversarial perturbations.
View details
Efficient Full-Matrix Adaptive Regularization
Naman Agarwal
Brian Anderson Bullins
Karan Singh
Cyril Zhang
Yi Zhang
ICML (2019)
Preview abstract
Adaptive regularization methods pre-multiply a descent direction by a preconditioning matrix. Due to the large number of parameters of machine learning problems, full-matrix preconditioning methods are prohibitively expensive. We show how to modify full-matrix adaptive regularization in order to make it practical and effective. We also provide novel theoretical analysis for adaptive regularization in non-convex optimization settings. The core of our algorithm, termed GGT, consists of efficient inverse computation of square roots of low-rank matrices. Our preliminary experiments underscore improved convergence rate of GGT across a variety of synthetic tasks and standard deep learning benchmarks.
View details
Preview abstract
We study the relationship between the notions of differentially private learning and online learning in games. Several recent works have shown that differentially private learning implies online learning,
but an open problem of Neel, Roth, and Wu \cite{NeelAaronRoth2018} asks whether this implication is efficient. Specifically, does an efficient differentially private learner imply an efficient online learner?
In this paper we resolve this open question in the context of pure differential privacy.
We derive an efficient black-box reduction from differentially private learning to online learning from expert advice.
View details
Preview abstract
We study the control of a linear dynamical system with adversarial disturbances (as opposed to statistical noise). The objective we consider is one of regret: we desire an online control procedure that can do nearly as well as that of a procedure that has full knowledge of the disturbances in hindsight. Our main result is an efficient algorithm that provides nearly tight regret bounds for this problem. From a technical standpoint, this work generalizes upon previous work in that our model allows for adversarial noise in the dynamics and allows for general convex costs.
View details
Preview abstract
We study the optimal regret bounds for control in linear dynamical systems with adversarially changing strongly convex cost functions. This framework includes several well studied and influential algorithms such as the Kalman filter and the linear quadratic regulator. State of the art methods achieve regret which scales as $O(\sqrt{T})$, where $T$ is the time horizon, or number of iterations.
We show that the optimal regret in this fundamental setting can be significantly smaller, and scales as $O(\poly(\log T))$, closing the gap in the literature between known upper and lower bounds. This regret bound is achieved by an efficient iterative gradient-based method.
View details
Preview abstract
Suppose an agent is in a (possibly unknown) Markov Decision Process in the absence of a reward signal, what might we hope that an agent can efficiently learn to do? One natural, intrinsically defined, objective problem is for the agent to learn a policy which induces a distribution over state space that is as uniform as possible, which can be measured in an entropic sense. We provide an efficient algorithm to construct such a maximum-entropy exploratory policy, when given access to a black box planning oracle (which is robust to function approximation). Furthermore, when restricted to the tabular setting where we have sample based access to the MDP, our proposed algorithm is provably efficient method, both in terms of sample size and computational complexity. Key to our algorithmic methodology is utilizing the conditional gradient method (a.k.a. the Frank-Wolfe algorithm) which utilizes an approximate MDP solver.
View details
Preview abstract
We consider online learning in an adversarial, non-convex setting under the assumption that the
learner has an access to an offline optimization oracle. In the general setting of prediction with expert
advice, [11] established that in the optimization-oracle model, online learning requires exponentially
more computation than statistical learning. In this paper we show that by slightly strengthening the
oracle model, the online and the statistical learning models become computationally equivalent. Our
result holds for any Lipschitz and bounded (but not necessarily convex) function. As an application we
demonstrate how the offline oracle enables efficient computation of an equilibrium in non-convex games, that include GAN (generative adversarial networks) as a special case.
View details