Gábor Bartók
Gábor is a software engineer at Google Zürich since September 2014. Prior to that, he was a Postdoctoral Fellow at ETH Zürich in the Learning & Adaptive Systems Group (LAS) under the supervision of Prof. Dr. Andreas Krause.
Gábor completed his PhD in 2012 at the University of Alberta, where he worked in the Reinforcement Learning and Artificial Intelligence (RLAI) group, supervised by Csaba Szepesvári. His PhD thesis was awarded Best PhD Dissertation Award by the Canadian Artificial Intelligence Association.
Research Areas
Authored Publications
Sort By
SmartChoices: Augmenting Software with Learned Implementations
Eric Yawei Chen
Νikhil Sarda
arXiv (2023)
Preview abstract
We are living in a golden age of machine learning. Powerful models are being trained to perform many tasks far better than is possible using traditional software engineering approaches alone. However, developing and deploying those models in existing software systems remains difficult. In this paper we present SmartChoices, a novel approach to incorporating machine learning into mature software stacks easily, safely, and effectively. We explain the overall design philosophy and present case studies using SmartChoices within large scale industrial systems.
View details
Flexible Multi-task Networks by Learning Parameter Allocation
Krzysztof Maziarz
Jesse Berent
ICLR 2021 Workshop on Neural Architecture Search (2021)
Preview abstract
Multi-task neural networks, when trained successfully, can learn to leverage related concepts from different tasks by using weight sharing. Sharing parameters between highly unrelated tasks can hurt both of them, so a strong multi-task model should be able to control the amount of weight sharing between pairs of tasks, and flexibly adapt it to their relatedness. In recent works, routing networks have shown strong performance in a variety of settings, including multi-task learning. However, optimization difficulties often prevent routing models from unlocking their full potential. In this work, we propose a novel routing method, specifically designed for multi-task learning, where routing is optimized jointly with the model parameters by standard backpropagation. We show that it can discover related pairs of tasks, and improve accuracy over strong baselines. In particular, on multi-task learning for the Omniglot dataset our method reduces the state-of-the-art error rate by $17\%$.
View details
Fast Task-Aware Architecture Inference
Anja Hauth
Jesse Berent
https://arxiv.org/abs/1902.05781 (2019)
Preview abstract
Neural architecture search has been shown to hold great promise towards the automation of deep learning. However in spite of its potential, neural architecture search remains quite costly. To this point, we propose a novel gradient-based framework for efficient architecture search by sharing information across several tasks. We start by training many model architectures on several related (training) tasks. When a new unseen task is presented, the framework performs architecture inference in order to quickly identify a good candidate architecture, before any model is trained on the new task. At the core of our framework lies a deep value network that can predict the performance of input architectures on a task by utilizing task meta-features and the previous model training experiments performed on related tasks. We adopt a continuous parametrization of the model architecture which allows for efficient gradient-based optimization. Given a new task, an effective architecture is quickly identified by maximizing the estimated performance with respect to the model architecture parameters with simple gradient ascent. It is key to point out that our goal is to achieve reasonable performance at the lowest cost. We provide experimental results showing the effectiveness of the framework despite its high computational efficiency.
View details
Ranking architectures using meta-learning
Alina Dubatovka
Jesse Berent
NeurIPS Workshop on Meta-Learning (MetaLearn 2019) (to appear)
Preview abstract
Neural architecture search has recently attracted lots of research efforts as it promises to automate the manual design of neural networks. However, it requires a large amount of computing resources and in order to alleviate this, a performance prediction network has been recently proposed that enables efficient architecture search by forecasting the performance of candidate architectures, instead of relying on actual model training. The performance predictor is task-aware taking as input not only the candidate architecture but also task meta-features and it has been designed to collectively learn from several tasks. In this work, we introduce a pairwise ranking loss for training a network able to rank candidate architectures for a new unseen task conditioning on its task meta-features. We present experimental results, showing that the ranking network is more effective in architecture search than the previously proposed performance predictor.
View details
Preview abstract
We propose a sample-efficient alternative for importance weighting for situations where one only
has sample access to the probability distribution that generates the observations. Our new method,
called Geometric Resampling (GR), is described and analyzed in the context of online combinatorial
optimization under semi-bandit feedback, where a learner sequentially selects its actions from a
combinatorial decision set so as to minimize its cumulative loss. In particular, we show that
the well-known Follow-the-Perturbed-Leader (FPL) prediction method coupled with Geometric
Resampling yields the first computationally efficient reduction from offline to online optimization
in this setting. We provide a thorough theoretical analysis for the resulting algorithm, showing that
its performance is on par with previous, inefficient solutions. Our main contribution is showing
that, despite the relatively large variance induced by the GR procedure, our performance guarantees
hold with high probability rather than only in expectation. As a side result, we also improve the
best known regret bounds for FPL in online combinatorial optimization with full feedback, closing
the perceived performance gap between FPL and exponential weights in this setting.
View details