Quentin Berthet
I am a Research Scientist in the Brain team of Google Research, in Paris. I work on core Machine Learning, with interests in Statistics and Optimization to tackle these problems.
Before coming to Google, I was a lecturer (tenured faculty position) at the University of Cambridge (Statistical Laboratory). Before that I was a CMI postdoc at Caltech. I did my Ph.D. at Princeton University and I am an alumni of Ecole Polytechnique.
[Homepage]
Authored Publications
Sort By
DeepConsensus improves the accuracy of sequences with a gap-aware sequence transformer
Aaron Wenger
Andrew Walker Carroll
Armin Töpfer
Ashish Teku Vaswani
Daniel Cook
Felipe Llinares
Gunjan Baid
Howard Cheng-Hao Yang
Jean-Philippe Vert
Kishwar Shafin
Maria Nattestad
Waleed Ammar
William J. Rowell
Nature Biotechnology (2022)
Preview abstract
Genomic analysis requires accurate sequencing in sufficient coverage and over difficult genome regions. Through repeated sampling of a circular template, Pacific Biosciences developed long (10-25kb) reads with high overall accuracy, but lower homopolymer accuracy. Here, we introduce DeepConsensus, a transformer-based approach which leverages a unique alignment loss to correct sequencing errors. DeepConsensus reduces errors in PacBio HiFi reads by 42%, compared to the current approach. We show this increases the yield of PacBio HiFi reads at Q20 by 9%, at Q30 by 27%, and at Q40 by 90%. With two SMRT cells of HG003, reads from DeepConsensus improve hifiasm assembly contiguity (NG50 4.9Mb to 17.2Mb), increase gene completeness (94% to 97%), reduce false gene duplication rate (1.1% to 0.5%), and improve assembly base accuracy (QV43 to QV45), and also reduce variant calling errors by 24%.
View details
How DeepConsensus Works
Aaron Wenger
Anastasiya Belyaeva
Andrew Carroll
Armin Töpfer
Ashish Teku Vaswani
Daniel Cook
Felipe Llinares
Gunjan Baid
Howard Yang
Jean-Philippe Vert
Kishwar Shafin
Maria Nattestad
Waleed Ammar
William J. Rowell
(2022)
Preview abstract
N/A
These are slides for a public video about DeepConsensus
View details
Preview abstract
Sorting is an elementary building block of modern software. In machine learning and statistics, it is commonly used in robust statistics, order statistics and ranking metrics. However, sorting is a piecewise linear function and as a result includes many kinks at which it is non-differentiable. More problematic, the ranking operator is a piecewise constant function, meaning that its derivatives are null or undefined. While numerous works have proposed differentiable proxies to sorting and ranking, they do not achieve the $O(n \log n)$ time complexity one could expect from a sorting or ranking operation. In this paper, we propose the first differentiable sorting and ranking operators with $O(n \log n)$ time and $O(n)$ space complexity. Our proposal in addition enjoys exact computation and differentiation. We achieve this feat by casting differentiable sorting and ranking as projections onto a permutahedron, the convex hull of permutations, and using a reduction to isotonic optimization. Empirically, we confirm that our approach is an order of magnitude faster than existing approaches. We also showcase two novel applications: differentiable Spearman's rank coefficient and differentiable least trimmed squares.
View details
Preview abstract
When the infection prevalence of a disease is low, Dorfman showed 80 years ago that testing groups of people can prove more efficient than testing people individually. Our goal in this paper is to propose new group testing algorithms that can operate in a noisy setting (tests can be mistaken) to decide adaptively (looking at past results) which groups to test next, with the goal to converge to a good detection, as quickly, and with as few tests as possible. We cast this problem as a Bayesian sequential experimental design problem. Using the posterior distribution of infection status vectors for n patients, given observed tests carried out so far, we seek to form groups that have a maximal utility. We consider utilities such as mutual information, but also quantities that have a more direct relevance to testing, such as the AUC of the ROC curve of the test. Practically, the posterior distributions on {0,1}^n are approximated by sequential Monte Carlo (SMC) samplers and the utility maximized by a greedy optimizer. Our procedures show in simulations significant improvements over both adaptive and non-adaptive baselines, and are far more efficient than individual tests when disease prevalence is low. Additionally, we show empirically that loopy belief propagation (LBP), widely regarded as the SoTA decoder to decide whether an individual is infected or not given previous tests, can be unreliable and exhibit oscillatory behavior. Our SMC decoder is more reliable, and can improve the performance of other group testing algorithms.
View details
Learning with Differentiable Perturbed Optimizers
Olivier Teboul
Marco Cuturi
Jean-Philippe Vert
Francis Bach
Advances in Neural Information Processing Systems 33 (NeurIPS), Curran Associates, Inc. (2020), pp. 9508-9519
Preview abstract
Machine learning pipelines often rely on optimization procedures to make discrete decisions (e.g. sorting, picking closest neighbors, finding shortest paths or optimal matchings). Although these discrete decisions are easily computed in a forward manner, they cannot be used to modify model parameters using first-order optimization techniques because they break the back-propagation of computational graphs. In order to expand the scope of learning problems that can be solved in an end-to-end fashion, we propose a systematic method to transform a block that outputs an optimal discrete decision into a differentiable operation. Our approach relies on stochastic perturbations of these parameters, and can be used readily within existing solvers without the need for ad hoc regularization or smoothing. These perturbed optimizers yield solutions that are differentiable and never locally constant. The amount of smoothness can be tuned via the chosen noise amplitude, whose impact we analyze. The derivatives of these perturbed solvers can be evaluated efficiently. We also show how this framework can be connected to a family of losses developed in structured prediction, and describe how these can be used in unsupervised and supervised learning, with theoretical guarantees. We demonstrate the performance of our approach on several machine learning tasks in experiments on synthetic and real data.
View details
Preview abstract
Optimal transport is a foundational problem in optimization, that allows to compare probability distributions while taking into account geometric aspects. Its optimal objective value, the Wasserstein distance, provides an important loss between distributions that has been used in many applications throughout machine learning and statistics. Recent algorithmic progress on this problem and its regularized versions have made these tools increasingly popular. However, existing techniques require solving an optimization problem to obtain a single gradient of the loss, thus slowing down first-order methods to minimize the sum of losses, that require many such gradient computations. In this work, we introduce an algorithm to solve a regularized version of this problem of Wasserstein estimators, with a time per step which is sublinear in the natural dimensions of the problem. We introduce a dual formulation, and optimize it with stochastic gradient steps that can be computed directly from samples, without solving additional optimization problems at each step. Doing so, the estimation and computation tasks are performed jointly. We show that this algorithm can be extended to other tasks, including estimation of Wasserstein barycenters. We provide theoretical guarantees and illustrate the performance of our algorithm with experiments on synthetic data.
View details
Preview abstract
We consider the problem of graph logistic regression, based on partial observation of a large network, and on side information associated to its vertices. The generative model is formulated as a matrix logistic regression. The performance of the model is analyzed in a high-dimensional regime under a structural assumption. The optimal statistical rates are derived, and an estimator based on penalized
maximum likelihood is shown to attain it. The algorithmic aspects of this problem are also studied, and optimal rates under computational constraints are derived, and shown to differ from the information-theoretic rates - under a complexity assumption.
View details