# Walid Krichene

Walid Krichene received his Ph.D. in Electrical Engineering and Computer Sciences from the University of California at Berkeley. His research focuses on optimization theory and differential privacy, and their applications to recommendation systems. He received the M.S. in Applied Mathematics from the Ecole des Mines Paristech and the M.A. in Mathematics from the University of California, Berkeley. He previously worked at Criteo labs, the CDC department at Caltech, and the CAOR group at the Ecole des Mines. Walid received the Best Paper Award at KDD 2020, the Leon Chua Award for outstanding achievement in nonlinear science, the Valedictorian Prize for the Tunisian Baccalaureate, and a Bronze Medal at the Pan African Math Olympiads.

### Research Areas

Authored Publications

Google Publications

Other Publications

Sort By

Private Alternating Least Squares: (Nearly) OptimalPrivacy/Utility Trade-off for Matrix Completion

Abhradeep Guha Thakurta

Li Zhang

Steffen Rendle

NA, NA (2021), NA

Preview abstract
We study the problem of differentially private (DP) matrix completion under user-level privacy. We design an $(\epsilon,\delta)$-joint differentially private variant of the popular Alternating-Least-Squares (ALS) method that achieves: i) (nearly) optimal sample complexity for matrix completion (in terms of number of items, users), and ii) best known privacy/utility trade-off both theoretically, as well as on benchmark data sets.
In particular, despite non-convexity of low-rank matrix completion and ALS, we provide the first global convergence analysis of ALS with {\em noise} introduced to ensure DP. For $n$ being the number of users and $m$ being the number of items in the rating matrix, our analysis requires only about $\log (n+m)$ samples per user (ignoring rank, condition number factors) and obtains a sample complexity of $n=\tilde\Omega(m/(\sqrt{\zeta}\cdot \epsilon))$ to ensure relative Frobenius norm error of $\zeta$. This improves significantly on the previous state of the result of $n=\tilde\Omega\left(m^{5/4}/(\zeta^{5}\epsilon)\right)$ for the private-FW method by ~\citet{jain2018differentially}.
Furthermore, we extensively validate our method on synthetic and benchmark data sets (MovieLens 10mi, MovieLens 20mi), and observe that private ALS only suffers a 6 percentage drop in accuracy when compared to the non-private baseline for $\epsilon\leq 10$. Furthermore, compared to prior work of~\cite{jain2018differentially}, it is at least better by 10 percentage for all choice of the privacy parameters.
View details

Efficient Training on Very Large Corpora via Gramian Estimation

Nicolas Mayoraz

Steffen Rendle

Li Zhang

Lichan Hong

John Anderson

ICLR 2019 (to appear)

Preview abstract
We study the problem of learning similarity functions over very large corpora using neural network embedding models. These models are typically trained using SGD with random sampling of unobserved pairs, with a sample size that grows quadratically with the corpus size, making it expensive to scale.
We propose new efficient methods to train these models without having to sample unobserved pairs. Inspired by matrix factorization, our approach relies on adding a global quadratic penalty and expressing this term as the inner-product of two generalized Gramians. We show that the gradient of this term can be efficiently computed by maintaining estimates of the Gramians, and develop variance reduction schemes to improve the quality of the estimates. We conduct large-scale experiments that show a significant improvement both in training time and generalization performance compared to sampling methods.
View details

Randomized Fractal Expansions for Production-Scale Public Collaborative-Filtering Data Sets

Francois Belletti

John Anderson

Karthik Singaram Lakshmanan

Nicolas Mayoraz

Pankaj Kanwar

Taylor Robie

Tayo Oguntebi

Yi-fan Chen

Arxiv (2019)

Preview abstract
Recommender system research suffers from a disconnect between the size of academic data sets and the scale of industrial production systems.
In order to bridge that gap, we propose to generate large-scale user/item interaction data sets by expanding pre-existing public data sets.
Our key contribution is a technique that expands user/item incidence matrices to large numbers of rows (users), columns (items), and non-zero values (interactions). The proposed method adapts Kronecker Graph Theory to preserve key higher order statistical properties such as the fat-tailed distribution of user engagements, item popularity, and singular value spectra of user/item interaction matrices.
Preserving such properties is key to building large realistic synthetic data sets which can be employed reliably to benchmark recommender systems and the systems employed to train them.
We further apply our stochastic expansion algorithm to the binarized MovieLens 20M data set, which comprises 20M interactions between 27K movies and 138K users.
The resulting expanded data set has 1.2B ratings, 2.2M users, and 855K items, which can be scaled up or down.
Furthermore, we present collaborative filtering experiments demonstrating that the generated synthetic data entails valuable insights for machine learning at scale in recommender systems.
We provide code pointers to reproduce our data and our experiments.
View details

Scaling Up Collaborative Filtering Data Sets through Randomized Fractal Expansions

Francois Belletti

Karthik Singaram Lakshmanan

Nicolas Mayoraz

Yi-fan Chen

John Anderson

Taylor Robie

Tayo Oguntebi

Amit Bleiwess

Dan Shirron

arXiv (2019)

Preview abstract
Recommender System research suffers from a disconnect between the size of academic data sets and the scale of industrial production systems.
In order to bridge that gap, we propose to generate large-scale User/Item interaction data sets by expanding pre-existing public data sets.
Our key contribution is a technique that expands User/Item incidence matrices matrices to large numbers of rows (users), columns (items), and non-zero values (interactions). The proposed method adapts Kronecker Graph Theory to preserve key higher order statistical properties such as the fat-tailed distribution of user engagements, item popularity, and singular value spectra of user/item interaction matrices.
Preserving such properties is key to building large realistic synthetic data sets which in turn can be employed reliably to benchmark Recommender Systems and the systems employed to train them. We further apply our stochastic expansion algorithm to the binarized MovieLens 20M data set, which comprises 20M interactions between 27K movies and 138K users.
The resulting expanded data set has 1.2B ratings, 2.2M users, and 855K items, which can be scaled up or down.
View details

Acceleration and Averaging in Stochastic Descent Dynamics

Peter Bartlett

30th Conference on Neural Information Processing Systems (NIPS) (2017)

Preview abstract
We formulate and study a general family of (continuous-time) stochastic dynamics for accelerated first-order minimization of smooth convex functions. Building on an averaging formulation of accelerated mirror descent, we propose a stochastic variant in which the gradient is contaminated by noise, and study the resulting stochastic differential equation. We prove a bound on the rate of change of an energy function associated with the problem, then use it to derive estimates of convergence rates of the function values (almost surely and in expectation), both for persistent and asymptotically vanishing noise. We discuss the interaction between the parameters of the dynamics (learning rate and averaging rates) and the covariation of the noise process. In particular, we show how the asymptotic rate of covariation affects the choice of parameters and, ultimately, the convergence rate.
View details

Adaptive Averaging in Accelerated Descent Dynamics

Alexandre Bayen

Peter Bartlett

29th Conference on Neural Information Processing Systems (NIPS) (2016)

Preview abstract
We study accelerated descent dynamics for constrained convex optimization. This dynamics can be described naturally as a coupling of a dual variable accumulating gradients at a given rate η(t), and a primal variable obtained as the weighted average of the mirrored dual trajectory, with weights w(t). Using a Lyapunov argument, we give sufficient conditions on η and w to achieve a desired convergence rate. As an example, we show that the replicator dynamics (an example of mirror descent on the simplex) can be accelerated using a simple averaging scheme. We then propose an adaptive averaging heuristic which adaptively computes the weights to speed up the decrease of the Lyapunov function. We provide guarantees on adaptive averaging, and give numerical experiments to compare it with existing heuristics, such as adaptive restarting. The experiments indicate that adaptive averaging performs at least as well as adaptive restarting, with significant improvements in some cases.
View details

Continuous and Discrete Dynamics for Online Learning and Convex Optimization

Ph.D. Thesis, University of California, Berkeley (2016)

On Learning How Players Learn: Estimation of Learning Dynamics in the Routing Game

Kiet Lam

Alexandre Bayen

ACM/IEEE 7th International Conference on Cyber-Physical Systems (ICCPS) (2016)

Minimizing Regret on Reflexive Banach Spaces and Nash Equilibria in Continuous Zero-Sum Games

Maximilian Balandat

Claire Tomlin

Alexandre Bayen

Advances in Neural Information Processing Systems 29 (NIPS) (2016)

On Social Optimal Routing Under Selfish Learning

Milena Suarez Castillo

Alexandre Bayen

IEEE Transactions on Control of Network Systems (2016)

Differential Privacy of Populations in Routing Games

Roy Dong

Alexandre Bayen

S. Shankar Sastry

IEEE 54th Annual Conference on Decision and Control (CDC) (2015), pp. 2798-2803

Accelerated Mirror Descent in Continuous and Discrete Time

Alexandre Bayen

Peter Bartlett

Advances in neural information processing systems 28 (NIPS) (2015), pp. 2845-2853

The Hedge Algorithm on a Continuum

Maximilian Balandat

Claire Tomlin

Alexandre Bayen

32nd International Conference on Machine Learning (ICML) (2015)

Efficient Bregman Projections onto the Simplex

Syrine Krichene

Alexandre Bayen

IEEE 54th Annual Conference on Decision and Control (CDC) (2015)

Online Learning of Nash Equilibria in Congestion Games

On the Convergence of No-Regret Learning in Selfish Routing

Benjamin Drighès

Alexandre Bayen

31st International Conference on Machine Learning (ICML) (2014), pp. 163-171

A PDE-ODE Model for a Junction with Ramp Buffer

Maria-Laura Delle Monache

Jack Reilly

Samitha Samaranayake

Paula Goatin

Alexandre Bayen

SIAM Journal on Applied Mathematics, vol. 74 (2014), pp. 22-39

Stability of Nash Equilibria in the Congestion Game under Replicator Dynamics

Benjamin Drighes

Alexandre Bayen

IEEE 53rd Annual Conference on Decision and Control (CDC) (2014)