Effrosyni Kokiopoulou
Efi is a research scientist at Google since February 2013. She joined Google as a PostDoc researcher in September 2011. Before that she was a postdoctoral research fellow at the Seminar for Applied Mathematics (SAM) at ETH, Zurich. She completed her PhD studies in December 2008 at the Signal Processing Laboratory (LTS4) of the Swiss Federal Institute of Technology (EPFL), Lausanne under the supervision of Prof. Pascal Frossard. Before that she was with the Computer Science & Engineering Department of the University of Minnesota, USA, where she obtained in June 2005 her M.Sc. degree under the supervision of Prof. Yousef Saad. She obtained B.Eng. and MscEng. degrees in 2002 and 2003 respectively at the Computer Engineering and Informatics Department of the University of Patras, Greece.
Research Areas
Authored Publications
Sort By
SmartChoices: Augmenting Software with Learned Implementations
Eric Yawei Chen
Νikhil Sarda
arXiv (2023)
Preview abstract
We are living in a golden age of machine learning. Powerful models are being trained to perform many tasks far better than is possible using traditional software engineering approaches alone. However, developing and deploying those models in existing software systems remains difficult. In this paper we present SmartChoices, a novel approach to incorporating machine learning into mature software stacks easily, safely, and effectively. We explain the overall design philosophy and present case studies using SmartChoices within large scale industrial systems.
View details
Transfer and Marginalize: Explaining Away Label Noise with Privileged Information
Mark Patrick Collier
Rodolphe Jenatton
Jesse Berent
ICML 2021 Workshop on Uncertainty & Robustness in Deep Learning (2021) (to appear)
Preview abstract
Supervised learning datasets often have privileged information, in the form of features which are available at training time but are not available at test time e.g. the ID of the annotator that provided the label. We argue that privileged information is useful for explaining away label noise, thereby reducing the harmful impact of noisy labels. We develop a simple and efficient method for supervised neural networks: it transfers the knowledge learned with privileged information via weight sharing and approximately marginalizes over privileged information at test time. Our method, TRAM (TRansfer and Marginalize), has the same test time computational cost as not using privileged information, and performs strongly on CIFAR-10H and ImageNet benchmarks.
View details
Flexible Multi-task Networks by Learning Parameter Allocation
Krzysztof Maziarz
Jesse Berent
ICLR 2021 Workshop on Neural Architecture Search (2021)
Preview abstract
Multi-task neural networks, when trained successfully, can learn to leverage related concepts from different tasks by using weight sharing. Sharing parameters between highly unrelated tasks can hurt both of them, so a strong multi-task model should be able to control the amount of weight sharing between pairs of tasks, and flexibly adapt it to their relatedness. In recent works, routing networks have shown strong performance in a variety of settings, including multi-task learning. However, optimization difficulties often prevent routing models from unlocking their full potential. In this work, we propose a novel routing method, specifically designed for multi-task learning, where routing is optimized jointly with the model parameters by standard backpropagation. We show that it can discover related pairs of tasks, and improve accuracy over strong baselines. In particular, on multi-task learning for the Omniglot dataset our method reduces the state-of-the-art error rate by $17\%$.
View details
Correlated Input-Dependent Label Noise in Large-Scale Image Classification
Mark Patrick Collier
Basil Mustafa
Rodolphe Jenatton
Jesse Berent
CVPR 2021 (2021), pp. 1551-1560
Preview abstract
Large scale image classification datasets often contain noisy labels. We take a principled probabilistic approach to modelling input-dependent, also known as heteroscedastic, label noise in these datasets. We place a multivariate Normal distributed latent variable on the final hidden layer of a neural network classifier. The covariance matrix of this latent variable, models the aleatoric uncertainty due to label noise. We demonstrate that the learned covariance structure captures known sources of label noise between semantically similar and co-occurring classes. Compared to standard neural network training and other baselines, we show significantly improved accuracy on Imagenet ILSVRC 2012 79.3% (+ 2.6%), Imagenet-21k 47.0% (+ 1.1%) and JFT 64.7% (+ 1.6%). We set a new state-of-the-art result on WebVision 1.0 with 76.6% top-1 accuracy. These datasets range from over 1M to over 300M training examples and from 1k classes to more than 21k classes. Our method is simple to use, and we provide an implementation that is a drop-in replacement for the final fully-connected layer in a deep classifier.
View details
Deep Classifiers with Label Noise Modeling and Distance Awareness
Vincent Fortuin
Mark Patrick Collier
Florian Wenzel
James Urquhart Allingham
Jesse Berent
Rodolphe Jenatton
NeurIPS 2021 Workshop on Bayesian Deep Learning (2021) (to appear)
Preview abstract
Uncertainty estimation in deep learning has recently emerged as a crucial area of interest to advance reliability and robustness of deep learning models, especially in safety-critical applications.
While there have been many proposed methods that either focus on distance-aware model uncertainties for out-of-distribution detection or respectively on input-dependent label uncertainties for in-distribution calibration, combining these two approaches has been less well explored.
In this work, we propose to combine these two ideas to achieve a joint modeling of model (epistemic) and data (aleatoric) uncertainty.
We show that our combined model affords a favorable combination between these two complementary types of uncertainty and thus achieves good performance in-distribution and out-of-distribution on different benchmark datasets.
View details
Preview abstract
Modelling uncertainty arising from input-dependent label noise is an increasingly important problem.
A state-of-the-art approach \cite{kendall2017uncertainties} places a normal distribution over the softmax logits, where the mean and variance of this distribution are learned functions of the inputs. This approach achieves impressive empirical performance but lacks theoretical justification. We show that this model is in fact a special case of a well known and theoretically understood model in the econometrics literature.
Under this view the softmax over the logit distribution is a smooth approximation to an argmax, where the approximation is exact in the zero temperature limit. We illustrate that the softmax temperature controls a bias-variance trade-off and the optimal point on this trade-off is not always found at $1.0$.
By tuning the temperature and the corresponding bias-variance trade-off, we achieve improved performance on well known image classification benchmarks, where we introduce noisy labels synthetically. For image segmentation, where input-dependent label noise naturally arises, we show that tuning the temperature increases the mean IoU on the PASCAL VOC and Cityscapes datasets by more than 1\% over the state-of-the-art model and a strong baseline that does not model this noise source.
View details
Routing Networks with Co-training for Continual Learning
Mark Patrick Collier
Jesse Berent
ICML 2020 Workshop on Continual Learning (to appear)
Preview abstract
Many continual learning methods can be characterized as either altering the learning algorithm in a fixed capacity neural network or dynamically growing the capacity of the network to handle new tasks. We propose to use fixed capacity sparse routing networks for continual learning. We retain the advantages of architectural solutions to the continual learning problem, in that different paths through the network can be learned for different tasks. However, we stay within the regime of fixed capacity networks which are more realistic for real-world use cases. We find it is necessary to develop a new training method for routing networks, which we call co-training which avoids poorly initialized experts when new tasks are presented. In initial experiments, when combined with a small episodic memory replay buffer, sparse routing networks with co-training outperform densely connected networks on the MNIST-Permutations and MNIST-Rotations benchmarks.
View details
Fast Task-Aware Architecture Inference
Anja Hauth
Jesse Berent
https://arxiv.org/abs/1902.05781 (2019)
Preview abstract
Neural architecture search has been shown to hold great promise towards the automation of deep learning. However in spite of its potential, neural architecture search remains quite costly. To this point, we propose a novel gradient-based framework for efficient architecture search by sharing information across several tasks. We start by training many model architectures on several related (training) tasks. When a new unseen task is presented, the framework performs architecture inference in order to quickly identify a good candidate architecture, before any model is trained on the new task. At the core of our framework lies a deep value network that can predict the performance of input architectures on a task by utilizing task meta-features and the previous model training experiments performed on related tasks. We adopt a continuous parametrization of the model architecture which allows for efficient gradient-based optimization. Given a new task, an effective architecture is quickly identified by maximizing the estimated performance with respect to the model architecture parameters with simple gradient ascent. It is key to point out that our goal is to achieve reasonable performance at the lowest cost. We provide experimental results showing the effectiveness of the framework despite its high computational efficiency.
View details
Ranking architectures using meta-learning
Alina Dubatovka
Jesse Berent
NeurIPS Workshop on Meta-Learning (MetaLearn 2019) (to appear)
Preview abstract
Neural architecture search has recently attracted lots of research efforts as it promises to automate the manual design of neural networks. However, it requires a large amount of computing resources and in order to alleviate this, a performance prediction network has been recently proposed that enables efficient architecture search by forecasting the performance of candidate architectures, instead of relying on actual model training. The performance predictor is task-aware taking as input not only the candidate architecture but also task meta-features and it has been designed to collectively learn from several tasks. In this work, we introduce a pairwise ranking loss for training a network able to rank candidate architectures for a new unseen task conditioning on its task meta-features. We present experimental results, showing that the ranking network is more effective in architecture search than the previously proposed performance predictor.
View details
General techniques for approximate incidences and their application to the camera posing problem
Micha Sharir
Bernhard Zeisl
The 35th International Symposium on Computational Geometry (2019)
Preview abstract
We consider the classical camera pose estimation problem that arises in many computer vision
applications, in which we are given n 2D-3D correspondences between points in the scene and
points in the camera image (some of which are incorrect associations), and where we aim to
determine the camera pose (the position and orientation of the camera in the scene) from this
data. We demonstrate that this posing problem can be reduced to the problem of computing
ε-approximate incidences between two-dimensional surfaces (derived from the input correspondences) and points (on a grid) in a four-dimensional pose space. Similar reductions can be applied
to other camera pose problems, as well as to similar problems in related application areas.
We describe and analyze three techniques for solving the resulting ε-approximate incidences
problem in the context of our camera posing application. The first is a straightforward assignment
of surfaces to the cells of a grid (of side-length ε) that they intersect. The second is a variant
of a primal-dual technique, recently introduced by a subset of the authors [2] for different (and
simpler) applications. The third is a non-trivial generalization of a data structure Fonseca and
Mount [3], originally designed for the case of hyperplanes. We present and analyze this technique
in full generality, and then apply it to the camera posing problem at hand.
We compare our methods experimentally on real and synthetic data. Our experiments show
that for the typical values of n and ε, the primal-dual method is the fastest, also in practice.
View details