Jump to Content
Naman Agarwal

Naman Agarwal

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract In this work, we consider the problem of collaborative multi-user reinforcement learning. In this setting there are multiple users with the same state-action space and transition probabilities but with different rewards. Under the assumption that the reward matrix of the N users has a low-rank structure -- a standard and practically successful assumption in the offline collaborative filtering setting-- the question is can we design algorithms with significantly lower sample complexity compared to the ones that learn the MDP individually for each user. Our main contribution is an algorithm which explores rewards collaboratively with N user-specific MDPs and can learn rewards efficiently in two key settings: tabular MDPs and linear MDPs. When N is large and the rank is constant, the sample complexity per MDP depends logarithmically over the size of the state-space, which represents an exponential reduction (in the state-space size) when compared to the standard ``non-collaborative'' algorithms. View details
    Preview abstract Q-learning is a popular Reinforcement Learning (RL) algorithm which is widely used in practice with function approximation \citep{mnih2015human}. In contrast, existing theoretical results are pessimistic about Q-learning. For example, \citep{baird1995residual} shows that Q-learning does not converge even with linear function approximation for linear MDPs. Furthermore, even for tabular MDPs with synchronous updates, Q-learning was shown to have sub-optimal sample complexity \citep{li2021q,azar2013minimax}. The goal of this work is to bridge the gap between practical success of Q-learning and the relatively pessimistic theoretical results. The starting point of our work is the observation that in practice, Q-learning is used with two important modifications: (i) training with two networks, called online network and target network simultaneously (online target learning, or OTL) , and (ii) experience replay (ER) \citep{mnih2015human}. While they have been observed to play a significant role in the practical success of Q-learning, a thorough theoretical understanding of how these two modifications improve the convergence behavior of Q-learning has been missing in literature. By carefully combining the Q-learning with OTL and \emph{reverse} experience replay (RER) (a form of experience replay), we present novel methods \qrex~and \qrexdr~(\qrex + data reuse). We show that \qrex~efficiently finds the optimal policy for linear MDPs and provide non-asymptotic bounds on sample complexity. Furthermore, we demonstrate that \qrexdr~in fact achieves near optimal sample complexity in the tabular setting, improving upon the existing results for vanilla Q-learning. View details
    Adaptive Gradient Methods at the Edge of Stability
    Behrooz Ghorbani
    David Cardoze
    Jeremy Cohen
    Justin Gilmer
    Shankar Krishnan
    NeuRIPS 2022 (2022) (to appear)
    Preview abstract Little is known about the training dynamics of adaptive gradient methods like Adam in deep learning. In this paper, we shed light on the behavior of these algorithms in the full-batch and sufficiently large batch settings. Specifically, we show that during full-batch training, the maximum eigenvalue of the \emph{preconditioned} Hessian typically equilibrates at the stability threshold of a related non-adaptive algorithm. For Adam with step size $\eta$ and $\beta_1 = 0.9$, this stability threshold is $38/\eta$. Similar effects occur during minibatch training, especially as the batch size grows. Yet, even though adaptive methods train at the “Edge of Stability,” their behavior in this regime differs in a crucial way from that of their non-adaptive counterparts. Whereas non-adaptive algorithms are forced to remain in low-curvature regions of the loss landscape, we demonstrate that adaptive gradient methods often advance into high-curvature regions, while adapting the preconditioner to compensate. We believe that our findings will serve as a foundation for the community’s future understanding of adaptive gradient methods in deep learning. View details
    Machine learning for medical ventilator control
    Cyril Zhang
    Daniel Cohen
    Edgar Minasyan
    Elad Hazan
    Julienne LaChance
    Karan Singh
    Manuel Schottdorf
    Paula Nicoleta Gradu
    Tom Zajdel
    Udaya Ghai
    ML4H (2021) (to appear)
    Preview abstract We consider the problem of controlling a medical ventilator for pressure controlled ventilation. The goal is to control airflow in and out of a sedated patient’s lung ac-cording to a trajectory of airway pressures specified by a clinician. PID, either hand-tuned or using lung-breath simulators based on gas dynamics, is the state-of-the-art control for ventilators. We consider a data-driven machine learning methodology to tackle this problem via first training a simulator based on collected data and then using this simulator to train controllers based on artificial neural networks. We show that our controller is able to track significantly better than PID controllers on FDA specified benchmarks. View details
    Preview abstract Multiclass logistic regression is a fundamental task in machine learning with applications in classification and boosting. Previous work (Foster et a. 2018) has highlighted the importance of improper predictors for achieving ``fast rates'' in the online multiclass logistic regression problem without suffering exponentially from secondary problem parameters, such as the norm of the predictors in the comparison class. While Foster et al. introduced a statistically optimal algorithm, it is in practice computationally intractable due to its run-time complexity being a large polynomial in the time horizon and dimension of input feature vectors. It has remained an open problem if algorithms exist that are both statistically and computationally efficient. Jezequel et al. answered this question for binary logistic regression by introducing AIOLI, an algorithm based on online newton step. However, their technique fails to generalize to more than two classes and it remains open whether their algorithm can be applied to practical applications of logistic regression, such as bandit multiclass learning or online boosting. In this paper, we develop a new algorithm, FOLKLORE, for the problem which runs significantly faster than the algorithm of Foster et al. -- the running time per iteration scales quadratically in the dimension -- at the cost of a linear dependence on the norm of the predictors in the regret bound. This yields the first practical algorithm for online multiclass logistic regression, resolving an open problem of Jezequel et al. We resolve the open questions by deriving an extension of AIOLI to the multi-class setting. Furthermore, we show that our algorithm can be applied to online bandit multiclass prediction and online multiclass boosting, yielding more practical algorithms for both problems compared to the ones in Foster et al. with similar performance guarantees. Finally, we also provide an online-to-batch conversion result for our algorithm. View details
    Preview abstract We study the question of how to aggregate controllers for dynamical systems in order to improve their performance. To this end, we propose a framework of boosting for online control. Our main result is an efficient boosting algorithm that combines weak controllers into a provably more accurate one. Empirical evaluation on a host of control settings supports our theoretical findings. View details
    Deluca -- A Differentiable Control Library: Environments, Methods, and Benchmarking
    Alex Jiawen Yu
    Anirudha Majumdar
    Cyril Zhang
    John Olof Hallman
    Karan Singh
    Paula Nicoleta Gradu
    Udaya Ghai
    Differentiable computer vision, graphics, and physics in machine learning workshop at Neurips 2020 (2020) (to appear)
    Preview abstract We present an open-source library of natively differentiable physics and robotics environments, accompanied by gradient-based control methods and a benchmarking suite. The introduced environments allow auto-differentiation through the simulation dynamics, and thereby permit fast training of controllers. The library features several popular environments, including classical control settings from OpenAI Gym \cite{brockman2016openai}. We also provide a novel differentiable environment, based on deep neural networks, that simulates medical ventilation. We give several use-cases of new scientific results obtained using the library. This includes a medical ventilator simulator and controller, an adaptive control method for time-varying linear dynamical systems, and new gradient-based methods for control of linear dynamical systems with adversarial perturbations. View details
    Stochastic Optimization with Laggard Data Pipelines
    Cyril Zhang
    Kunal Talwar
    Rohan Anil
    Thirty-fourth Conference on Neural Information Processing Systems, 2020 (2020) (to appear)
    Preview abstract State-of-the-art optimization has increasingly moved toward massively parallel pipelines with extremely large batches. As a consequence, the performance bottleneck is shifting towards the CPU- and disk-bound data loading and preprocessing, as opposed to hardware-accelerated backpropagation. In this regime, a recently proposed approach is data echoing (Choi et al. '19), which takes repeated gradient steps on the same batch. We provide the first convergence analysis of data echoing-based extensions of ubiquitous optimization methods, exhibiting provable improvements over their synchronous counterparts. Specifically, we show that asynchronous batch reuse can magnify the gradient signal in a stochastic batch, without harming the statistical rate. View details
    Preview abstract We study the optimal regret bounds for control in linear dynamical systems with adversarially changing strongly convex cost functions. This framework includes several well studied and influential algorithms such as the Kalman filter and the linear quadratic regulator. State of the art methods achieve regret which scales as $O(\sqrt{T})$, where $T$ is the time horizon, or number of iterations. We show that the optimal regret in this fundamental setting can be significantly smaller, and scales as $O(\poly(\log T))$, closing the gap in the literature between known upper and lower bounds. This regret bound is achieved by an efficient iterative gradient-based method. View details
    Preview abstract We consider online learning in an adversarial, non-convex setting under the assumption that the learner has an access to an offline optimization oracle. In the general setting of prediction with expert advice, [11] established that in the optimization-oracle model, online learning requires exponentially more computation than statistical learning. In this paper we show that by slightly strengthening the oracle model, the online and the statistical learning models become computationally equivalent. Our result holds for any Lipschitz and bounded (but not necessarily convex) function. As an application we demonstrate how the offline oracle enables efficient computation of an equilibrium in non-convex games, that include GAN (generative adversarial networks) as a special case. View details
    Online Control with Adversarial Disturbances
    Brian Anderson Bullins
    Sham Kakade
    Karan Singh
    ICML (2019)
    Preview abstract We study the control of a linear dynamical system with adversarial disturbances (as opposed to statistical noise). The objective we consider is one of regret: we desire an online control procedure that can do nearly as well as that of a procedure that has full knowledge of the disturbances in hindsight. Our main result is an efficient algorithm that provides nearly tight regret bounds for this problem. From a technical standpoint, this work generalizes upon previous work in that our model allows for adversarial noise in the dynamics and allows for general convex costs. View details
    No Results Found