# Walid Krichene

Walid Krichene received his Ph.D. in Electrical Engineering and Computer Sciences from the University of California at Berkeley. His research focuses on optimization theory and differential privacy, and their applications to recommendation systems. He received the M.S. in Applied Mathematics from the Ecole des Mines Paristech and the M.A. in Mathematics from the University of California, Berkeley. He previously worked at Criteo labs, the CDC department at Caltech, and the CAOR group at the Ecole des Mines. Walid received the Best Paper Award at KDD 2020, the Leon Chua Award for outstanding achievement in nonlinear science, the Valedictorian Prize for the Tunisian Baccalaureate, and a Bronze Medal at the Pan African Math Olympiads.

### Research Areas

Authored Publications

Google Publications

Other Publications

Sort By

Revisiting the Performance of iALS on Item Recommendation Benchmarks

Steffen Rendle

Li Zhang

Yehuda Koren

Proceedings of the 16th ACM Conference on Recommender Systems, Association for Computing Machinery(2022), pp. 427-435

Preview abstract
Matrix factorization learned by implicit alternating least squares (iALS) is a popular baseline in recommender system research publications. iALS is known to be one of the most computationally efficient and scalable collaborative filtering methods. However, recent studies suggest that its prediction quality is not competitive with the current state of the art, in particular autoencoders and other item-based collaborative filtering methods. In this work, we revisit four well-studied benchmarks where iALS was reported to perform poorly and show that with proper tuning, iALS is highly competitive and outperforms any method on at least half of the comparisons. We hope that these high quality results together with iALS's known scalability spark new interest in applying and further improving this decade old technique.
View details

Private Alternating Least Squares: (Nearly) OptimalPrivacy/Utility Trade-off for Matrix Completion

Abhradeep Guha Thakurta

Li Zhang

Steffen Rendle

NA, NA(2021), NA

Preview abstract
We study the problem of differentially private (DP) matrix completion under user-level privacy. We design an $(\epsilon,\delta)$-joint differentially private variant of the popular Alternating-Least-Squares (ALS) method that achieves: i) (nearly) optimal sample complexity for matrix completion (in terms of number of items, users), and ii) best known privacy/utility trade-off both theoretically, as well as on benchmark data sets.
In particular, despite non-convexity of low-rank matrix completion and ALS, we provide the first global convergence analysis of ALS with {\em noise} introduced to ensure DP. For $n$ being the number of users and $m$ being the number of items in the rating matrix, our analysis requires only about $\log (n+m)$ samples per user (ignoring rank, condition number factors) and obtains a sample complexity of $n=\tilde\Omega(m/(\sqrt{\zeta}\cdot \epsilon))$ to ensure relative Frobenius norm error of $\zeta$. This improves significantly on the previous state of the result of $n=\tilde\Omega\left(m^{5/4}/(\zeta^{5}\epsilon)\right)$ for the private-FW method by ~\citet{jain2018differentially}.
Furthermore, we extensively validate our method on synthetic and benchmark data sets (MovieLens 10mi, MovieLens 20mi), and observe that private ALS only suffers a 6 percentage drop in accuracy when compared to the non-private baseline for $\epsilon\leq 10$. Furthermore, compared to prior work of~\cite{jain2018differentially}, it is at least better by 10 percentage for all choice of the privacy parameters.
View details

Scaling Up Collaborative Filtering Data Sets through Randomized Fractal Expansions

Francois Belletti

Karthik Singaram Lakshmanan

Nicolas Mayoraz

Yi-fan Chen

John Anderson

Taylor Robie

Tayo Oguntebi

Amit Bleiwess

Dan Shirron

arXiv(2019)

Preview abstract
Recommender System research suffers from a disconnect between the size of academic data sets and the scale of industrial production systems.
In order to bridge that gap, we propose to generate large-scale User/Item interaction data sets by expanding pre-existing public data sets.
Our key contribution is a technique that expands User/Item incidence matrices matrices to large numbers of rows (users), columns (items), and non-zero values (interactions). The proposed method adapts Kronecker Graph Theory to preserve key higher order statistical properties such as the fat-tailed distribution of user engagements, item popularity, and singular value spectra of user/item interaction matrices.
Preserving such properties is key to building large realistic synthetic data sets which in turn can be employed reliably to benchmark Recommender Systems and the systems employed to train them. We further apply our stochastic expansion algorithm to the binarized MovieLens 20M data set, which comprises 20M interactions between 27K movies and 138K users.
The resulting expanded data set has 1.2B ratings, 2.2M users, and 855K items, which can be scaled up or down.
View details

Efficient Training on Very Large Corpora via Gramian Estimation

Nicolas Mayoraz

Steffen Rendle

Li Zhang

Lichan Hong

John Anderson

ICLR 2019 (to appear)

Preview abstract
We study the problem of learning similarity functions over very large corpora using neural network embedding models. These models are typically trained using SGD with random sampling of unobserved pairs, with a sample size that grows quadratically with the corpus size, making it expensive to scale.
We propose new efficient methods to train these models without having to sample unobserved pairs. Inspired by matrix factorization, our approach relies on adding a global quadratic penalty and expressing this term as the inner-product of two generalized Gramians. We show that the gradient of this term can be efficiently computed by maintaining estimates of the Gramians, and develop variance reduction schemes to improve the quality of the estimates. We conduct large-scale experiments that show a significant improvement both in training time and generalization performance compared to sampling methods.
View details

Randomized Fractal Expansions for Production-Scale Public Collaborative-Filtering Data Sets

Francois Belletti

John Anderson

Karthik Singaram Lakshmanan

Nicolas Mayoraz

Pankaj Kanwar

Taylor Robie

Tayo Oguntebi

Yi-fan Chen

Arxiv(2019)

Preview abstract
Recommender system research suffers from a disconnect between the size of academic data sets and the scale of industrial production systems.
In order to bridge that gap, we propose to generate large-scale user/item interaction data sets by expanding pre-existing public data sets.
Our key contribution is a technique that expands user/item incidence matrices to large numbers of rows (users), columns (items), and non-zero values (interactions). The proposed method adapts Kronecker Graph Theory to preserve key higher order statistical properties such as the fat-tailed distribution of user engagements, item popularity, and singular value spectra of user/item interaction matrices.
Preserving such properties is key to building large realistic synthetic data sets which can be employed reliably to benchmark recommender systems and the systems employed to train them.
We further apply our stochastic expansion algorithm to the binarized MovieLens 20M data set, which comprises 20M interactions between 27K movies and 138K users.
The resulting expanded data set has 1.2B ratings, 2.2M users, and 855K items, which can be scaled up or down.
Furthermore, we present collaborative filtering experiments demonstrating that the generated synthetic data entails valuable insights for machine learning at scale in recommender systems.
We provide code pointers to reproduce our data and our experiments.
View details

Adaptive Averaging in Accelerated Descent Dynamics

Alexandre Bayen

Peter Bartlett

29th Conference on Neural Information Processing Systems (NIPS)(2016)

Preview abstract
We study accelerated descent dynamics for constrained convex optimization. This dynamics can be described naturally as a coupling of a dual variable accumulating gradients at a given rate η(t), and a primal variable obtained as the weighted average of the mirrored dual trajectory, with weights w(t). Using a Lyapunov argument, we give sufficient conditions on η and w to achieve a desired convergence rate. As an example, we show that the replicator dynamics (an example of mirror descent on the simplex) can be accelerated using a simple averaging scheme. We then propose an adaptive averaging heuristic which adaptively computes the weights to speed up the decrease of the Lyapunov function. We provide guarantees on adaptive averaging, and give numerical experiments to compare it with existing heuristics, such as adaptive restarting. The experiments indicate that adaptive averaging performs at least as well as adaptive restarting, with significant improvements in some cases.
View details

Minimizing Regret on Reflexive Banach Spaces and Nash Equilibria in Continuous Zero-Sum Games

Maximilian Balandat

Claire Tomlin

Alexandre Bayen

Advances in Neural Information Processing Systems 29 (NIPS)(2016)

Preview abstract
We study a general adversarial online learning problem, in which we are given a decision set X' in a reflexive Banach space X and a sequence of reward vectors in the dual space of X. At each iteration, we choose an action from X', based on the observed sequence of previous rewards. Our goal is to minimize regret, defined as the gap between the realized reward and the reward of the best fixed action in hindsight. Using results from infinite dimensional convex analysis, we generalize the method of Dual Averaging (or Follow the Regularized Leader) to our setting and obtain upper bounds on the worst-case regret that generalize many previous results. Under the assumption of uniformly continuous rewards, we obtain explicit regret bounds in a setting where the decision set is the set of probability distributions on a compact metric space S. Importantly, we make no convexity assumptions on either the set S or the reward functions. We also prove a general lower bound on the worst-case regret for any online algorithm. We then apply these results to the problem of learning in repeated two-player zero-sum games on compact metric spaces. In doing so, we first prove that if both players play a Hannan-consistent strategy, then with probability 1 the empirical distributions of play weakly converge to the set of Nash equilibria of the game. We then show that, under mild assumptions, Dual Averaging on the (infinite-dimensional) space of probability distributions indeed achieves Hannan-consistency.
View details

Continuous and Discrete Dynamics for Online Learning and Convex Optimization

Ph.D. Thesis, University of California, Berkeley(2016)

Preview abstract
Online learning and convex optimization algorithms have become essential tools for solving problems in modern machine learning, statistics and engineering. Many algorithms for online learning and convex optimization can be interpreted as a discretization of a continuous time process, and studying the continuous time dynamics offers many advantages: the analysis is often simpler and more elegant in continuous time, it provides insights and leads to new interpretations of the discrete process, and streamlines the design of new algorithms, obtained by deriving the dynamics in continuous time, then discretizing. In this thesis, we apply this paradigm to two problems: the study of decision dynamics for online learning in games, and the design and analysis of accelerated methods for convex optimization.
In the first part of the thesis, we study online learning dynamics for a class of games called non-atomic convex potential games, which are used for example to model congestion in transportation and communication networks. We make a connection between the discrete Hedge algorithm for online learning, and an ODE on the simplex, known as the replicator dynamics. We study the asymptotic properties of the ODE, then by discretizing the ODE and using results from stochastic approximation theory, we derive a new class of online learning algorithms with asymptotic convergence guarantees. We further give a more refined analysis of these dynamics and their convergence rates. Then, using the Hedge algorithm as a model of decision dynamics, we pose and study two related problems: the problem of estimating the learning rates of the Hedge algorithm given observations on its sequence of decisions, and the problem of optimal control under Hedge dynamics.
In the second part, we study first-order accelerated dynamics for constrained convex optimization. We develop a method to design an ODE for the problem using an inverse Lyapunov argument: we start from an energy function that encodes the constraints of the problem and the desired convergence rate, then design an ODE tailored to that energy function. Then, by carefully discretizing the ODE, we obtain a family of accelerated algorithms with optimal rate of convergence. This results in a unified framework to derive and analyze most known first-order methods, from gradient descent and mirror descent to their accelerated versions. We give different interpretations of the ODE, inspired from physics and statistics. In particular, we give an averaging interpretation of accelerated dynamics, and derive simple sufficient conditions on the averaging scheme to guarantee a given rate of convergence. We also develop an adaptive averaging heuristic that empirically speeds up the convergence, and in many cases performs significantly better than popular heuristics such as restarting.
View details

On Learning How Players Learn: Estimation of Learning Dynamics in the Routing Game

Kiet Lam

Alexandre Bayen

ACM/IEEE 7th International Conference on Cyber-Physical Systems (ICCPS)(2016)

Preview abstract
The routing game models congestion in transportation networks, communication networks, and other cyber physical systems in which agents compete for shared resources. We consider an online learning model of player dynamics: at each iteration, every player chooses a route (or a probability distribution over routes, which corresponds to a flow allocation over the physical network), then the joint decision of all players determines the costs of each path, which are then revealed to the players. We pose the following estimation problem: given a sequence of player decisions and the corresponding costs, we would like to estimate the learning model parameters. We consider in particular entropic mirror descent dynamics, reduce the problem to estimating the learning rates of each player. We demonstrate this method using data collected from a routing game experiment, played by human participants: We develop a web application to implement the routing game. When players log in, they are assigned an origin and destination on the graph. They can choose, at each iteration, a distribution over their available routes, and each player seeks to minimize her own cost. We collect a data set using this interface, then apply the proposed method to estimate the learning model parameters. We observe in particular that after an exploration phase, the joint decision of the players remains within a small distance of the Nash equilibrium. We also use the estimated model parameters to predict the flow distribution over routes, and compare these predictions to the actual distribution. Finally, we discuss some of the qualitative implications of the experiments, and give directions for future research.
View details

On Social Optimal Routing Under Selfish Learning

Milena Suarez Castillo

Alexandre Bayen

IEEE Transactions on Control of Network Systems(2016)

Preview abstract
We consider a repeated routing game over a finite horizon with partial control under selfish response, in which a central authority can control a fraction of the flow and seeks to improve a network-wide objective, while the remaining flow applies an online learning algorithm. This finite horizon control problem is inspired from the one-shot Stackelberg routing game. Our setting is different in that we do not assume that the selfish players play a Nash equilibrium; instead, we assume that they apply an online learning algorithm. This results in an optimal control problem under learning dynamics. We propose different methods for approximately solving this problem: A greedy algorithm and a mirror descent algorithm based on the adjoint method. In particular, we derive the adjoint system equations of the Hedge dynamics and show that they can be solved efficiently. We compare the performance of these methods (in terms of achieved cost and computational complexity) on parallel networks, and on a model of the Los Angeles highway network.
View details

Online Learning of Nash Equilibria in Congestion Games

Preview abstract
We study the repeated, nonatomic congestion game, in which multiple populations of players share resources and make, at each iteration, a decentralized decision on which resources to utilize. We investigate the following question: given a model of how individual players update their strategies, does the resulting dynamics of strategy profiles converge to the set of Nash equilibria of the one-shot game? We consider in particular a model in which players update their strategies using algorithms with sublinear discounted regret. We show that the resulting sequence of strategy profiles converges to the set of Nash equilibria in the sense of Cesàro means. However, convergence of the actual sequence is not guaranteed in general. We show that it can be guaranteed for a class of algorithms with a sublinear discounted regret and which satisfy an additional condition. We call such algorithms AREP (approximate replicator) algorithms, as they can be interpreted as a discrete-time approximation of the replicator equation, which models the continuous-time evolution of population strategies, and which is known to converge for the class of congestion games.
View details

Differential Privacy of Populations in Routing Games

Roy Dong

Alexandre Bayen

S. Shankar Sastry

IEEE 54th Annual Conference on Decision and Control (CDC)(2015), pp. 2798-2803

Preview abstract
As our ground transportation infrastructure modernizes, the large amount of data being measured, transmitted, and stored motivates an analysis of the privacy aspect of these emerging cyber-physical technologies. In this paper, we consider privacy in the routing game, where the origins and destinations of drivers are considered private. This is motivated by the fact that this spatiotemporal information can easily be used as the basis for inferences for a person's activities. More specifically, we consider the differential privacy of the mapping from the amount of flow for each origin-destination pair to the traffic flow measurements on each link of a traffic network. We use a stochastic online learning framework for the population dynamics, which is known to converge to the Nash equilibrium of the routing game. We analyze the sensitivity of this process and provide theoretical guarantees on the convergence rates as well as differential privacy values for these models. We confirm these with simulations on a small example.
View details

Accelerated Mirror Descent in Continuous and Discrete Time

Alexandre Bayen

Peter Bartlett

Advances in neural information processing systems 28 (NIPS)(2015), pp. 2845-2853

Preview abstract
We study accelerated mirror descent dynamics in continuous and discrete time. Combining the original continuous-time motivation of mirror descent with a recent ODE interpretation of Nesterov's accelerated method, we propose a family of continuous-time descent dynamics for convex functions with Lipschitz gradients, such that the solution trajectories are guaranteed to converge to the optimum at a O(1/t^2) rate. We then show that a large family of first-order accelerated methods can be obtained as a discretization of the ODE, and these methods converge at a O(1/k^2) rate. This connection between accelerated mirror descent and the ODE provides an intuitive approach to the design and analysis of accelerated first-order algorithms.
View details

The Hedge Algorithm on a Continuum

Maximilian Balandat

Claire Tomlin

Alexandre Bayen

32nd International Conference on Machine Learning (ICML)(2015)

Preview abstract
We consider an online optimization problem on a subset S of R^n (not necessarily convex), in which a decision maker chooses, at each iteration t, a probability distribution x^(t) over S, and seeks to minimize a cumulative expected loss, where each loss is a Lipschitz function revealed at the end of iteration t. Building on previous work, we propose a generalized Hedge algorithm and show a O(\sqrtt \log t) bound on the regret when the losses are uniformly Lipschitz and S is uniformly fat (a weaker condition than convexity). Finally, we propose a generalization to the dual averaging method on the set of Lebesgue-continuous distributions over S.
View details

Efficient Bregman Projections onto the Simplex

Syrine Krichene

Alexandre Bayen

IEEE 54th Annual Conference on Decision and Control (CDC)(2015)

Preview abstract
We consider the problem of projecting a vector onto the simplex, using a Bregman projection. This is a common problem in first-order methods for convex optimization and online-learning algorithms, such as mirror descent. We derive the KKT conditions of the projection problem, and show that for Bregman divergences induced by ω-potentials, one can efficiently compute the solution using a bisection method. More precisely, an epsilon-approximate projection can be obtained in O(d log 1/epsilon). We also consider a class of exponential potentials for which the exact solution can be computed efficiently, and give a O(d log d) deterministic algorithm and O(d) randomized algorithm to compute the projection. In particular, we show that one can generalize the KL divergence to a Bregman divergence which is bounded on the simplex (unlike the KL divergence), strongly convex with respect to the l1 norm, and for which one can still solve the projection in expected linear time.
View details

A PDE-ODE Model for a Junction with Ramp Buffer

Maria-Laura Delle Monache

Jack Reilly

Samitha Samaranayake

Paula Goatin

Alexandre Bayen

SIAM Journal on Applied Mathematics, 74(2014), pp. 22-39

Preview abstract
We consider the Lighthill--Whitham--Richards traffic flow model on a junction composed by
one mainline, an onramp, and an offramp, which are connected by a node. The onramp
dynamics is modeled using an ordinary differential equation describing the evolution of the
queue length. The definition of the solution of the Riemann problem at the junction is based
on an optimization problem and the use of a right-of-way parameter. The numerical
approximation is carried out using a Godunov scheme, modified to take into account the
effects of the onramp buffer. We present the result of some simulations and numerically
check the convergence of the method.
View details

On the Convergence of No-Regret Learning in Selfish Routing

Benjamin Drighès

Alexandre Bayen

31st International Conference on Machine Learning (ICML)(2014), pp. 163-171

Preview abstract
We study the repeated, non-atomic routing game, in which selfish players make a sequence of routing decisions. We consider a model in which players use regret-minimizing algorithms as the learning mechanism, and study the resulting dynamics. We are concerned in particular with the convergence to the set of Nash equilibria of the routing game. No-regret learning algorithms are known to guarantee convergence of a subsequence of population strategies. We are concerned with convergence of the actual sequence. We show that convergence holds for a large class of online learning algorithms, inspired from the continuous-time replicator dynamics. In particular, the discounted Hedge algorithm is proved to belong to this class, which guarantees its convergence.
View details

Stability of Nash Equilibria in the Congestion Game under Replicator Dynamics

Benjamin Drighes

Alexandre Bayen

IEEE 53rd Annual Conference on Decision and Control (CDC)(2014)

Preview abstract
We consider the single commodity non-atomic congestion game, in which the player
population is assumed to obey the replicator dynamics. We study the resulting rest points,
and relate them to the Nash equilibria of the one-shot congestion game. The rest points of
the replicator dynamics, also called evolutionary stable points, are known to coincide with a
superset of Nash equilibria, called restricted equilibria. By studying the spectrum of the
linearized system around rest points, we show that Nash equilibria are locally asymptotically
stable stationary points. We also show that under the additional assumption of strictly
increasing congestion functions, Nash equilibria are exactly the set of exponentially stable
points. We illustrate these results on numerical examples.
View details