Nevena Lazic
Research Areas
Authored Publications
Sort By
Robotic Table Tennis: A Case Study into a High Speed Learning System
Jon Abelian
Saminda Abeyruwan
Michael Ahn
Justin Boyd
Erwin Johan Coumans
Omar Escareno
Wenbo Gao
Navdeep Jaitly
Juhana Kangaspunta
Satoshi Kataoka
Gus Kouretas
Yuheng Kuang
Corey Lynch
Thinh Nguyen
Ken Oslund
Barney J. Reed
Anish Shankar
Avi Singh
Grace Vesom
Peng Xu
Robotics: Science and Systems (2023)
Preview abstract
We present a deep-dive into a learning robotic system that, in previous work, was shown to be capable of hundreds of table tennis rallies with a human and has the ability to precisely return the ball to desired targets. This system puts together a highly optimized and novel perception subsystem, a high-speed low-latency robot controller, a simulation paradigm that can prevent damage in the real world and also train policies for zero-shot transfer, and automated real world environment resets that enable autonomous training and evaluation on physical robots. We complement a complete system description including numerous design decisions that are typically not widely disseminated, with a collection of ablation studies that clarify the importance of mitigating various sources of latency, accounting for training and deployment distribution shifts, robustness of the perception system, and sensitivity to policy hyper-parameters and choice of action space. A video demonstrating the components of our system and details of experimental results is included in the supplementary material.
View details
A Maximum-entropy Approach to Off-policy Evaluation in Average-reward MDPs
Dong Yin
Mehrdad Farajtabar
Nir Levine
Dilan Gorur
Chris Harris
Neural Information Processing Systems (NeurIPS) (2020)
Preview abstract
This work focuses on off-policy evaluation (OPE) with function approximation in infinite-horizon undiscounted Markov decision processes (MDPs). For MDPs that are ergodic and linear (i.e. where rewards and dynamics are linear in some known features), we provide the first finite-sample OPE error bound, extending existing results beyond the episodic and discounted cases. In a more general setting, when the feature dynamics are approximately linear and for arbitrary rewards, we propose a new approach for estimating stationary distributions with function approximation. We formulate this problem as finding the maximum-entropy distribution subject to matching feature expectations under empirical dynamics. We show that this results in an exponential-family distribution whose sufficient statistics are the features, paralleling maximum-entropy approaches in supervised learning. We demonstrate the effectiveness of the proposed OPE approaches in multiple environments.
View details
Robotic Table Tennis with Model-Free Reinforcement Learning
Wenbo Gao
Navdeep Jaitly
International Conference on Intelligent Robots and Systems (IROS) (2020)
Preview abstract
We propose a model-free algorithm for learning efficient policies capable of returning table tennis balls by controlling robot joints at a rate of 100Hz. We demonstrate that evolutionary search (ES) methods acting on CNN-based policy architectures for non-visual inputs and convolving across time learn compact controllers leading to smooth motions. Furthermore, we show that with appropriately tuned curriculum learning on the task and rewards, policies are capable of developing multi-modal styles, specifically forehand and backhand stroke, whilst achieving 80\% return rate on a wide range of ball throws. We observe that multi-modality does not require any architectural priors, such as multi-head architectures or hierarchical policies.
View details
Politex: Regret Bounds for Policy Iteration using Expert Prediction
Yasin Abbasi-Yadkori
Peter Bartlett
Kush Bhatia
Gellért Weisz
ICML (2019)
Preview abstract
We present POLITEX (POLicy ITeration with EXpert advice), a variant of policy iteration where each policy is a Boltzmann distribution over the sum of action-value function estimates of the previous policies, and analyze its regret in continuing RL problems. We assume that the value function error after running a policy for m time steps scales as E(m) = E0 + O((d/m)^{1/2}), where E0 is the worst-case approximation error and d is the number of features in a compressed representation of the state-action space. We establish that this condition is satisfied by the LSPE algorithm under certain assumptions on the MDP and policies. Under the error assumption, we show that the regret of POLITEX in uniformly mixing MDPs scales as O(d^{1/2}T^{3/4} + E0T), where O(.) hides logarithmic terms and problem-dependent constants. Thus, we provide the first regret bound for a fully practical model-free method which only scales in the number of features, and not in the size of the underlying MDP. Experiments on a queuing problem confirm that POLITEX is competitive with some of its alternatives, while preliminary results on Ms Pacman (one of the standard Atari benchmark problems) confirm the viability of POLITEX beyond linear function approximation.
View details
Preview abstract
Model-free approaches for reinforcement learning (RL) and continuous control find policies based only on past states and rewards, without fitting a model of the system dynamics. They are appealing as they are general purpose and easy to implement; however, they also come with fewer theoretical guarantees than model-based RL. In this work, we present a new model-free algorithm for controlling linear quadratic (LQ) systems, and show that its regret scales as O(T^(ξ+2/3)). The algorithm is based on a reduction of control of Markov decision processes to an expert prediction problem. In practice, it corresponds to a variant of policy iteration with forced exploration, where the policy in each phase is greedy with respect to the average of all previous value functions. This is the first model-free algorithm for adaptive control of LQ systems that provably achieves sublinear regret and has a polynomial computation cost. Empirically, our algorithm dramatically outperforms standard policy iteration, but performs worse than a model-based approach.
View details
Data Center Cooling using Model-predictive Control
Tyler Lu
MK Ryu
Eehern Jay Wong
Binz Roy
Greg Imwalle
Proceedings of the Thirty-second Conference on Neural Information Processing Systems (NeurIPS-18), Montreal, QC (2018), pp. 3818-3827
Preview abstract
Despite the impressive advances in reinforcement learning (RL) algorithms, their deployment to real-world physical systems is often complicated by unexpected events and the potential for expensive failures. In this paper we describe an application of RL “in the wild” to the task of regulating temperatures and airflow inside a large-scale data center (DC). Adopting a data-driven model-based approach, we demonstrate that an RL agent is able to effectively and safely regulate conditions inside a server floor in just a few hours, while improving operational efficiency relative to existing controllers.
View details
Preview abstract
We study the problem of controlling linear time-invariant systems with known noisy dynamics and adversarially chosen quadratic losses. We present the first efficient online learning algorithms in this setting that guarantee O(√T) regret under mild assumptions, where T is the time horizon. Our algorithms rely on a novel SDP relaxation for the steady-state distribution of the system. Crucially, and in contrast to previously proposed relaxations, the feasible solutions of our SDP all correspond to strongly stable'' policies that mix exponentially fast to a steady state.
View details
Preview abstract
Data-independent methods for dimensionality reduction such as random projections,
sketches, and feature hashing have become increasingly popular in recent
years. These methods often seek to reduce dimensionality while preserving the
hypothesis class, resulting in inherent lower bounds on the size of projected data.
For example, preserving linear separability requires Ω(1/γ2
) dimensions, where γ
is the margin, and in the case of polynomial functions, the number of required dimensions
has an exponential dependence on the polynomial degree. Despite these
limitations, we show that the dimensionality can be reduced further while maintaining
performance guarantees, using improper learning with a slightly larger
hypothesis class. In particular, we show that any sparse polynomial function of a
sparse binary vector can be computed from a compact sketch by a single-layer neural
network, where the sketch size has a logarithmic dependence on the polynomial
degree. A practical consequence is that networks trained on sketched data are
compact, and therefore suitable for settings with memory and power constraints.
We empirically show that our approach leads to networks with fewer parameters
than related methods such as feature hashing, at equal or better performance.
View details
Collective Entity Resolution with Multi-Focal Attention
Soumen Chakrabarti
Michael Ringaard
ACL (2016)
Preview abstract
Entity resolution is the task of linking each mention of an entity in text to the corresponding record in a knowledge base (KB). Coherence models for entity resolution encourage all referring expressions in a document to resolve to entities that are related in the KB. We explore attention-like mechanisms for coherence, where the evidence for each candidate is based on a small set of strong relations, rather than relations to all other entities in the document. The rationale is that document-wide support may simply not exist for non-salient entities, or entities not densely connected in the KB. Our proposed system outperforms state-of-the-art systems on the CoNLL 2003, TAC KBP 2010, 2011
and 2012 tasks.
View details
Preview abstract
We propose a new approach to the task of fine grained entity type classifications based on label embeddings that allows for information sharing among related labels. Specifically, we learn an embedding for each label and each feature such that labels which frequently co-occur are close in
the embedded space. We show that it outperforms state-of-the-art methods on two fine grained entity-classification benchmarks and that the model can exploit the finer-grained labels to improve classification of standard coarse types.
View details