Quentin Berthet

Quentin Berthet

I am a Research Scientist in the Brain team of Google Research, in Paris. I work on core Machine Learning, with interests in Statistics and Optimization to tackle these problems. Before coming to Google, I was a lecturer (tenured faculty position) at the University of Cambridge (Statistical Laboratory). Before that I was a CMI postdoc at Caltech. I did my Ph.D. at Princeton University and I am an alumni of Ecole Polytechnique. [Homepage]
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    How DeepConsensus Works
    Aaron Wenger
    Anastasiya Belyaeva
    Andrew Carroll
    Armin Töpfer
    Ashish Teku Vaswani
    Daniel Cook
    Felipe Llinares
    Gunjan Baid
    Howard Yang
    Jean-Philippe Vert
    Kishwar Shafin
    Maria Nattestad
    Waleed Ammar
    William J. Rowell
    (2022)
    Preview abstract N/A These are slides for a public video about DeepConsensus View details
    DeepConsensus improves the accuracy of sequences with a gap-aware sequence transformer
    Aaron Wenger
    Andrew Walker Carroll
    Armin Töpfer
    Ashish Teku Vaswani
    Daniel Cook
    Felipe Llinares
    Gunjan Baid
    Howard Cheng-Hao Yang
    Jean-Philippe Vert
    Kishwar Shafin
    Maria Nattestad
    Waleed Ammar
    William J. Rowell
    Nature Biotechnology(2022)
    Preview abstract Genomic analysis requires accurate sequencing in sufficient coverage and over difficult genome regions. Through repeated sampling of a circular template, Pacific Biosciences developed long (10-25kb) reads with high overall accuracy, but lower homopolymer accuracy. Here, we introduce DeepConsensus, a transformer-based approach which leverages a unique alignment loss to correct sequencing errors. DeepConsensus reduces errors in PacBio HiFi reads by 42%, compared to the current approach. We show this increases the yield of PacBio HiFi reads at Q20 by 9%, at Q30 by 27%, and at Q40 by 90%. With two SMRT cells of HG003, reads from DeepConsensus improve hifiasm assembly contiguity (NG50 4.9Mb to 17.2Mb), increase gene completeness (94% to 97%), reduce false gene duplication rate (1.1% to 0.5%), and improve assembly base accuracy (QV43 to QV45), and also reduce variant calling errors by 24%. View details
    Preview abstract We consider the problem of graph logistic regression, based on partial observation of a large network, and on side information associated to its vertices. The generative model is formulated as a matrix logistic regression. The performance of the model is analyzed in a high-dimensional regime under a structural assumption. The optimal statistical rates are derived, and an estimator based on penalized maximum likelihood is shown to attain it. The algorithmic aspects of this problem are also studied, and optimal rates under computational constraints are derived, and shown to differ from the information-theoretic rates - under a complexity assumption. View details
    Learning with Differentiable Perturbed Optimizers
    Olivier Teboul
    Marco Cuturi
    Jean-Philippe Vert
    Francis Bach
    Advances in Neural Information Processing Systems 33 (NeurIPS), Curran Associates, Inc.(2020), pp. 9508-9519
    Preview abstract Machine learning pipelines often rely on optimization procedures to make discrete decisions (e.g. sorting, picking closest neighbors, finding shortest paths or optimal matchings). Although these discrete decisions are easily computed in a forward manner, they cannot be used to modify model parameters using first-order optimization techniques because they break the back-propagation of computational graphs. In order to expand the scope of learning problems that can be solved in an end-to-end fashion, we propose a systematic method to transform a block that outputs an optimal discrete decision into a differentiable operation. Our approach relies on stochastic perturbations of these parameters, and can be used readily within existing solvers without the need for ad hoc regularization or smoothing. These perturbed optimizers yield solutions that are differentiable and never locally constant. The amount of smoothness can be tuned via the chosen noise amplitude, whose impact we analyze. The derivatives of these perturbed solvers can be evaluated efficiently. We also show how this framework can be connected to a family of losses developed in structured prediction, and describe how these can be used in unsupervised and supervised learning, with theoretical guarantees. We demonstrate the performance of our approach on several machine learning tasks in experiments on synthetic and real data. View details
    Fast Differentiable Sorting and Ranking
    Olivier Teboul
    Josip Djolonga
    ICML(2020) (to appear)
    Preview abstract Sorting is an elementary building block of modern software. In machine learning and statistics, it is commonly used in robust statistics, order statistics and ranking metrics. However, sorting is a piecewise linear function and as a result includes many kinks at which it is non-differentiable. More problematic, the ranking operator is a piecewise constant function, meaning that its derivatives are null or undefined. While numerous works have proposed differentiable proxies to sorting and ranking, they do not achieve the $O(n \log n)$ time complexity one could expect from a sorting or ranking operation. In this paper, we propose the first differentiable sorting and ranking operators with $O(n \log n)$ time and $O(n)$ space complexity. Our proposal in addition enjoys exact computation and differentiation. We achieve this feat by casting differentiable sorting and ranking as projections onto a permutahedron, the convex hull of permutations, and using a reduction to isotonic optimization. Empirically, we confirm that our approach is an order of magnitude faster than existing approaches. We also showcase two novel applications: differentiable Spearman's rank coefficient and differentiable least trimmed squares. View details
    Stochastic Optimization for Regularized Wasserstein Estimators
    Francis Bach
    Marin Ballu
    submitted to ICML(2020) (to appear)
    Preview abstract Optimal transport is a foundational problem in optimization, that allows to compare probability distributions while taking into account geometric aspects. Its optimal objective value, the Wasserstein distance, provides an important loss between distributions that has been used in many applications throughout machine learning and statistics. Recent algorithmic progress on this problem and its regularized versions have made these tools increasingly popular. However, existing techniques require solving an optimization problem to obtain a single gradient of the loss, thus slowing down first-order methods to minimize the sum of losses, that require many such gradient computations. In this work, we introduce an algorithm to solve a regularized version of this problem of Wasserstein estimators, with a time per step which is sublinear in the natural dimensions of the problem. We introduce a dual formulation, and optimize it with stochastic gradient steps that can be computed directly from samples, without solving additional optimization problems at each step. Doing so, the estimation and computation tasks are performed jointly. We show that this algorithm can be extended to other tasks, including estimation of Wasserstein barycenters. We provide theoretical guarantees and illustrate the performance of our algorithm with experiments on synthetic data. View details
    Noisy Adaptive Group Testing using Bayesian Sequential Experimental Design
    Marco Cuturi
    Olivier Teboul
    Arnaud Doucet
    Jean-Philippe Vert
    arXiv(2020)
    Preview abstract When the infection prevalence of a disease is low, Dorfman showed 80 years ago that testing groups of people can prove more efficient than testing people individually. Our goal in this paper is to propose new group testing algorithms that can operate in a noisy setting (tests can be mistaken) to decide adaptively (looking at past results) which groups to test next, with the goal to converge to a good detection, as quickly, and with as few tests as possible. We cast this problem as a Bayesian sequential experimental design problem. Using the posterior distribution of infection status vectors for n patients, given observed tests carried out so far, we seek to form groups that have a maximal utility. We consider utilities such as mutual information, but also quantities that have a more direct relevance to testing, such as the AUC of the ROC curve of the test. Practically, the posterior distributions on {0,1}^n are approximated by sequential Monte Carlo (SMC) samplers and the utility maximized by a greedy optimizer. Our procedures show in simulations significant improvements over both adaptive and non-adaptive baselines, and are far more efficient than individual tests when disease prevalence is low. Additionally, we show empirically that loopy belief propagation (LBP), widely regarded as the SoTA decoder to decide whether an individual is infected or not given previous tests, can be unreliable and exhibit oscillatory behavior. Our SMC decoder is more reliable, and can improve the performance of other group testing algorithms. View details