Walid Krichene

Walid Krichene

Walid Krichene received his Ph.D. in Electrical Engineering and Computer Sciences from the University of California at Berkeley. His research focuses on optimization theory and differential privacy, and their applications to recommendation systems. He received the M.S. in Applied Mathematics from the Ecole des Mines Paristech and the M.A. in Mathematics from the University of California, Berkeley. He previously worked at Criteo labs, the CDC department at Caltech, and the CAOR group at the Ecole des Mines. Walid received the Best Paper Award at KDD 2020, the Leon Chua Award for outstanding achievement in nonlinear science, the Valedictorian Prize for the Tunisian Baccalaureate, and a Bronze Medal at the Pan African Math Olympiads.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Revisiting the Performance of iALS on Item Recommendation Benchmarks
    Steffen Rendle
    Li Zhang
    Yehuda Koren
    Proceedings of the 16th ACM Conference on Recommender Systems, Association for Computing Machinery(2022), pp. 427-435
    Preview abstract Matrix factorization learned by implicit alternating least squares (iALS) is a popular baseline in recommender system research publications. iALS is known to be one of the most computationally efficient and scalable collaborative filtering methods. However, recent studies suggest that its prediction quality is not competitive with the current state of the art, in particular autoencoders and other item-based collaborative filtering methods. In this work, we revisit four well-studied benchmarks where iALS was reported to perform poorly and show that with proper tuning, iALS is highly competitive and outperforms any method on at least half of the comparisons. We hope that these high quality results together with iALS's known scalability spark new interest in applying and further improving this decade old technique. View details
    Preview abstract We study the problem of differentially private (DP) matrix completion under user-level privacy. We design an $(\epsilon,\delta)$-joint differentially private variant of the popular Alternating-Least-Squares (ALS) method that achieves: i) (nearly) optimal sample complexity for matrix completion (in terms of number of items, users), and ii) best known privacy/utility trade-off both theoretically, as well as on benchmark data sets. In particular, despite non-convexity of low-rank matrix completion and ALS, we provide the first global convergence analysis of ALS with {\em noise} introduced to ensure DP. For $n$ being the number of users and $m$ being the number of items in the rating matrix, our analysis requires only about $\log (n+m)$ samples per user (ignoring rank, condition number factors) and obtains a sample complexity of $n=\tilde\Omega(m/(\sqrt{\zeta}\cdot \epsilon))$ to ensure relative Frobenius norm error of $\zeta$. This improves significantly on the previous state of the result of $n=\tilde\Omega\left(m^{5/4}/(\zeta^{5}\epsilon)\right)$ for the private-FW method by ~\citet{jain2018differentially}. Furthermore, we extensively validate our method on synthetic and benchmark data sets (MovieLens 10mi, MovieLens 20mi), and observe that private ALS only suffers a 6 percentage drop in accuracy when compared to the non-private baseline for $\epsilon\leq 10$. Furthermore, compared to prior work of~\cite{jain2018differentially}, it is at least better by 10 percentage for all choice of the privacy parameters. View details
    Neural Collaborative Filtering vs. Matrix Factorization Revisited
    Steffen Rendle
    Li Zhang
    John Roberts Anderson
    Recsys 2020
    Preview abstract Embedding based models have been the state of the art in collaborative filtering for over a decade. Traditionally, the dot product or higher order equivalents have been used to combine two or more embeddings, e.g., most notably in matrix factorization. In recent years, it was suggested to replace the dot product with a learned similarity e.g. using a multilayer perceptron (MLP). This approach is often referred to as neural collaborative filtering (NCF). In this work, we revisit the experiments of the NCF paper that popularized learned similarities using MLPs. First, we show that with a proper hyperparameter selection, a simple dot product substantially outperforms the proposed learned similarities. Second, while a MLP can in theory approximate any function, we show that it is non-trivial to learn a dot product with an MLP. Finally, we discuss practical issues that arise when applying MLP based similarities and show that MLPs are too costly to use for item recommendation in production environments while dot products allow to apply very efficient retrieval algorithms. We conclude that MLPs should be used with care as embedding combiner and that dot products might be a better default choice. View details
    Preview abstract Recent results have shown that for two-layer fully connected neural networks, gradient flow converges to a global optimum in the infinite width limit, by making a connection between the mean field dynamics and the Wasserstein gradient flow. These results were derived for first-order gradient flow, and a natural question is whether second-order dynamics, i.e., dynamics with momentum, exhibit a similar guarantee. We show that the answer is positive for the heavy ball method. In this case, the resulting integro-PDE is a nonlinear kinetic Fokker Planck equation, and unlike the first-order case, it has no apparent connection with the Wasserstein gradient flow. Instead, we study the variations of a Lyapunov functional along the solution trajectories to characterize the stationary points and to prove convergence. While our results are asymptotic in the mean field limit, numerical simulations indicate that global convergence may already occur for reasonably small networks. View details
    Superbloom: Bloom filter meets Transformer
    John Roberts Anderson
    Qingqing Huang
    Steffen Rendle
    Li Zhang
    Google(2020)
    Preview abstract We extend the idea of word pieces in natural language models to machine learning tasks on opaque ids. This is achieved by applying hash functions to map each id to multiple hash tokens in a much smaller space, similarly to a Bloom filter. We show that by applying a multi-layer Transformer to these Bloom filter digests, we are able to obtain models with high accuracy. They outperform models of a similar size without hashing and, to a large degree, models of a much larger size trained using sampled softmax with the same computational budget. Our key observation is that it is important to use a multi-layer Transformer for Bloom filter digests to remove ambiguity in the hashed input. We believe this provides an alternative method to solving problems with large vocabulary size. View details
    Preview abstract The task of item recommendation requires ranking a large catalogue of items given a context. Item recommendation algorithms are evaluated using ranking metrics that depend on the positions of relevant items. To speed up the computation of metrics, recent work often uses sampled metrics where only a smaller set of random items and the relevant items are ranked. This paper investigates sampled metrics in more detail and shows that they are inconsistent with their exact version, in the sense that they do not persist relative statements, e.g., recommender A is better than B, not even in expectation. Moreover, the smaller the sampling size, the less difference there is between metrics, and for very small sampling size, all metrics collapse to the AUC metric. We show that it is possible to improve the quality of the sampled metrics by applying a correction, obtained by minimizing different criteria such as bias or mean squared error. We conclude with an empirical evaluation of the naive sampled metrics and their corrected variants. To summarize, our work suggests that sampling should be avoided for metric calculation, however if an experimental study needs to sample, the proposed corrections can improve the quality of the estimate. View details
    Rankmax: An Adaptive Projection Alternative to the Softmax Function
    Weiwei Kong
    Nicolas Mayoraz
    Steffen Rendle
    Li Zhang
    (2020)
    Preview abstract Several machine learning models involve mapping a score vector to a probability vector. Usually, this is done by projecting the score vector onto a probability simplex, and such projections are often characterized as Lipschitz continuous approximations of the argmax function, whose Lipschitz constant is controlled by a parameter that is similar to a softmax temperature. The aforementioned parameter has been observed to affect the quality of these models and is typically either treated as a constant or decayed over time. In this work, we propose a method that adapts this parameter to individual training examples. The resulting method exhibits desirable properties, such as sparsity of its support and numerically efficient implementation, and we find that it significantly outperforms competing non-adaptive projection methods. In our analysis, we also derive the general solution of (Bregman) projections onto the (n, k)-simplex, a result which may be of independent interest. View details
    Scaling Up Collaborative Filtering Data Sets through Randomized Fractal Expansions
    Francois Belletti
    Karthik Singaram Lakshmanan
    Nicolas Mayoraz
    Yi-fan Chen
    John Anderson
    Taylor Robie
    Tayo Oguntebi
    Amit Bleiwess
    Dan Shirron
    arXiv(2019)
    Preview abstract Recommender System research suffers from a disconnect between the size of academic data sets and the scale of industrial production systems. In order to bridge that gap, we propose to generate large-scale User/Item interaction data sets by expanding pre-existing public data sets. Our key contribution is a technique that expands User/Item incidence matrices matrices to large numbers of rows (users), columns (items), and non-zero values (interactions). The proposed method adapts Kronecker Graph Theory to preserve key higher order statistical properties such as the fat-tailed distribution of user engagements, item popularity, and singular value spectra of user/item interaction matrices. Preserving such properties is key to building large realistic synthetic data sets which in turn can be employed reliably to benchmark Recommender Systems and the systems employed to train them. We further apply our stochastic expansion algorithm to the binarized MovieLens 20M data set, which comprises 20M interactions between 27K movies and 138K users. The resulting expanded data set has 1.2B ratings, 2.2M users, and 855K items, which can be scaled up or down. View details
    Efficient Training on Very Large Corpora via Gramian Estimation
    Nicolas Mayoraz
    Steffen Rendle
    Li Zhang
    Lichan Hong
    John Anderson
    ICLR 2019 (to appear)
    Preview abstract We study the problem of learning similarity functions over very large corpora using neural network embedding models. These models are typically trained using SGD with random sampling of unobserved pairs, with a sample size that grows quadratically with the corpus size, making it expensive to scale. We propose new efficient methods to train these models without having to sample unobserved pairs. Inspired by matrix factorization, our approach relies on adding a global quadratic penalty and expressing this term as the inner-product of two generalized Gramians. We show that the gradient of this term can be efficiently computed by maintaining estimates of the Gramians, and develop variance reduction schemes to improve the quality of the estimates. We conduct large-scale experiments that show a significant improvement both in training time and generalization performance compared to sampling methods. View details
    Randomized Fractal Expansions for Production-Scale Public Collaborative-Filtering Data Sets
    Francois Belletti
    John Anderson
    Karthik Singaram Lakshmanan
    Nicolas Mayoraz
    Pankaj Kanwar
    Taylor Robie
    Tayo Oguntebi
    Yi-fan Chen
    Arxiv(2019)
    Preview abstract Recommender system research suffers from a disconnect between the size of academic data sets and the scale of industrial production systems. In order to bridge that gap, we propose to generate large-scale user/item interaction data sets by expanding pre-existing public data sets. Our key contribution is a technique that expands user/item incidence matrices to large numbers of rows (users), columns (items), and non-zero values (interactions). The proposed method adapts Kronecker Graph Theory to preserve key higher order statistical properties such as the fat-tailed distribution of user engagements, item popularity, and singular value spectra of user/item interaction matrices. Preserving such properties is key to building large realistic synthetic data sets which can be employed reliably to benchmark recommender systems and the systems employed to train them. We further apply our stochastic expansion algorithm to the binarized MovieLens 20M data set, which comprises 20M interactions between 27K movies and 138K users. The resulting expanded data set has 1.2B ratings, 2.2M users, and 855K items, which can be scaled up or down. Furthermore, we present collaborative filtering experiments demonstrating that the generated synthetic data entails valuable insights for machine learning at scale in recommender systems. We provide code pointers to reproduce our data and our experiments. View details
    Acceleration and Averaging in Stochastic Descent Dynamics
    Peter Bartlett
    30th Conference on Neural Information Processing Systems (NIPS)(2017)
    Preview abstract We formulate and study a general family of (continuous-time) stochastic dynamics for accelerated first-order minimization of smooth convex functions. Building on an averaging formulation of accelerated mirror descent, we propose a stochastic variant in which the gradient is contaminated by noise, and study the resulting stochastic differential equation. We prove a bound on the rate of change of an energy function associated with the problem, then use it to derive estimates of convergence rates of the function values (almost surely and in expectation), both for persistent and asymptotically vanishing noise. We discuss the interaction between the parameters of the dynamics (learning rate and averaging rates) and the covariation of the noise process. In particular, we show how the asymptotic rate of covariation affects the choice of parameters and, ultimately, the convergence rate. View details
    Adaptive Averaging in Accelerated Descent Dynamics
    Alexandre Bayen
    Peter Bartlett
    29th Conference on Neural Information Processing Systems (NIPS)(2016)
    Preview abstract We study accelerated descent dynamics for constrained convex optimization. This dynamics can be described naturally as a coupling of a dual variable accumulating gradients at a given rate η(t), and a primal variable obtained as the weighted average of the mirrored dual trajectory, with weights w(t). Using a Lyapunov argument, we give sufficient conditions on η and w to achieve a desired convergence rate. As an example, we show that the replicator dynamics (an example of mirror descent on the simplex) can be accelerated using a simple averaging scheme. We then propose an adaptive averaging heuristic which adaptively computes the weights to speed up the decrease of the Lyapunov function. We provide guarantees on adaptive averaging, and give numerical experiments to compare it with existing heuristics, such as adaptive restarting. The experiments indicate that adaptive averaging performs at least as well as adaptive restarting, with significant improvements in some cases. View details
    Minimizing Regret on Reflexive Banach Spaces and Nash Equilibria in Continuous Zero-Sum Games
    Maximilian Balandat
    Claire Tomlin
    Alexandre Bayen
    Advances in Neural Information Processing Systems 29 (NIPS)(2016)
    Preview abstract We study a general adversarial online learning problem, in which we are given a decision set X' in a reflexive Banach space X and a sequence of reward vectors in the dual space of X. At each iteration, we choose an action from X', based on the observed sequence of previous rewards. Our goal is to minimize regret, defined as the gap between the realized reward and the reward of the best fixed action in hindsight. Using results from infinite dimensional convex analysis, we generalize the method of Dual Averaging (or Follow the Regularized Leader) to our setting and obtain upper bounds on the worst-case regret that generalize many previous results. Under the assumption of uniformly continuous rewards, we obtain explicit regret bounds in a setting where the decision set is the set of probability distributions on a compact metric space S. Importantly, we make no convexity assumptions on either the set S or the reward functions. We also prove a general lower bound on the worst-case regret for any online algorithm. We then apply these results to the problem of learning in repeated two-player zero-sum games on compact metric spaces. In doing so, we first prove that if both players play a Hannan-consistent strategy, then with probability 1 the empirical distributions of play weakly converge to the set of Nash equilibria of the game. We then show that, under mild assumptions, Dual Averaging on the (infinite-dimensional) space of probability distributions indeed achieves Hannan-consistency. View details
    Continuous and Discrete Dynamics for Online Learning and Convex Optimization
    Ph.D. Thesis, University of California, Berkeley(2016)
    Preview abstract Online learning and convex optimization algorithms have become essential tools for solving problems in modern machine learning, statistics and engineering. Many algorithms for online learning and convex optimization can be interpreted as a discretization of a continuous time process, and studying the continuous time dynamics offers many advantages: the analysis is often simpler and more elegant in continuous time, it provides insights and leads to new interpretations of the discrete process, and streamlines the design of new algorithms, obtained by deriving the dynamics in continuous time, then discretizing. In this thesis, we apply this paradigm to two problems: the study of decision dynamics for online learning in games, and the design and analysis of accelerated methods for convex optimization. In the first part of the thesis, we study online learning dynamics for a class of games called non-atomic convex potential games, which are used for example to model congestion in transportation and communication networks. We make a connection between the discrete Hedge algorithm for online learning, and an ODE on the simplex, known as the replicator dynamics. We study the asymptotic properties of the ODE, then by discretizing the ODE and using results from stochastic approximation theory, we derive a new class of online learning algorithms with asymptotic convergence guarantees. We further give a more refined analysis of these dynamics and their convergence rates. Then, using the Hedge algorithm as a model of decision dynamics, we pose and study two related problems: the problem of estimating the learning rates of the Hedge algorithm given observations on its sequence of decisions, and the problem of optimal control under Hedge dynamics. In the second part, we study first-order accelerated dynamics for constrained convex optimization. We develop a method to design an ODE for the problem using an inverse Lyapunov argument: we start from an energy function that encodes the constraints of the problem and the desired convergence rate, then design an ODE tailored to that energy function. Then, by carefully discretizing the ODE, we obtain a family of accelerated algorithms with optimal rate of convergence. This results in a unified framework to derive and analyze most known first-order methods, from gradient descent and mirror descent to their accelerated versions. We give different interpretations of the ODE, inspired from physics and statistics. In particular, we give an averaging interpretation of accelerated dynamics, and derive simple sufficient conditions on the averaging scheme to guarantee a given rate of convergence. We also develop an adaptive averaging heuristic that empirically speeds up the convergence, and in many cases performs significantly better than popular heuristics such as restarting. View details
    On Learning How Players Learn: Estimation of Learning Dynamics in the Routing Game
    Kiet Lam
    Alexandre Bayen
    ACM/IEEE 7th International Conference on Cyber-Physical Systems (ICCPS)(2016)
    Preview abstract The routing game models congestion in transportation networks, communication networks, and other cyber physical systems in which agents compete for shared resources. We consider an online learning model of player dynamics: at each iteration, every player chooses a route (or a probability distribution over routes, which corresponds to a flow allocation over the physical network), then the joint decision of all players determines the costs of each path, which are then revealed to the players. We pose the following estimation problem: given a sequence of player decisions and the corresponding costs, we would like to estimate the learning model parameters. We consider in particular entropic mirror descent dynamics, reduce the problem to estimating the learning rates of each player. We demonstrate this method using data collected from a routing game experiment, played by human participants: We develop a web application to implement the routing game. When players log in, they are assigned an origin and destination on the graph. They can choose, at each iteration, a distribution over their available routes, and each player seeks to minimize her own cost. We collect a data set using this interface, then apply the proposed method to estimate the learning model parameters. We observe in particular that after an exploration phase, the joint decision of the players remains within a small distance of the Nash equilibrium. We also use the estimated model parameters to predict the flow distribution over routes, and compare these predictions to the actual distribution. Finally, we discuss some of the qualitative implications of the experiments, and give directions for future research. View details
    On Social Optimal Routing Under Selfish Learning
    Milena Suarez Castillo
    Alexandre Bayen
    IEEE Transactions on Control of Network Systems(2016)
    Preview abstract We consider a repeated routing game over a finite horizon with partial control under selfish response, in which a central authority can control a fraction of the flow and seeks to improve a network-wide objective, while the remaining flow applies an online learning algorithm. This finite horizon control problem is inspired from the one-shot Stackelberg routing game. Our setting is different in that we do not assume that the selfish players play a Nash equilibrium; instead, we assume that they apply an online learning algorithm. This results in an optimal control problem under learning dynamics. We propose different methods for approximately solving this problem: A greedy algorithm and a mirror descent algorithm based on the adjoint method. In particular, we derive the adjoint system equations of the Hedge dynamics and show that they can be solved efficiently. We compare the performance of these methods (in terms of achieved cost and computational complexity) on parallel networks, and on a model of the Los Angeles highway network. View details
    Online Learning of Nash Equilibria in Congestion Games
    Benjamin Drighès
    Alexandre Bayen
    SIAM Journal on Control and Optimization(2015)
    Preview abstract We study the repeated, nonatomic congestion game, in which multiple populations of players share resources and make, at each iteration, a decentralized decision on which resources to utilize. We investigate the following question: given a model of how individual players update their strategies, does the resulting dynamics of strategy profiles converge to the set of Nash equilibria of the one-shot game? We consider in particular a model in which players update their strategies using algorithms with sublinear discounted regret. We show that the resulting sequence of strategy profiles converges to the set of Nash equilibria in the sense of Cesàro means. However, convergence of the actual sequence is not guaranteed in general. We show that it can be guaranteed for a class of algorithms with a sublinear discounted regret and which satisfy an additional condition. We call such algorithms AREP (approximate replicator) algorithms, as they can be interpreted as a discrete-time approximation of the replicator equation, which models the continuous-time evolution of population strategies, and which is known to converge for the class of congestion games. View details
    Differential Privacy of Populations in Routing Games
    Roy Dong
    Alexandre Bayen
    S. Shankar Sastry
    IEEE 54th Annual Conference on Decision and Control (CDC)(2015), pp. 2798-2803
    Preview abstract As our ground transportation infrastructure modernizes, the large amount of data being measured, transmitted, and stored motivates an analysis of the privacy aspect of these emerging cyber-physical technologies. In this paper, we consider privacy in the routing game, where the origins and destinations of drivers are considered private. This is motivated by the fact that this spatiotemporal information can easily be used as the basis for inferences for a person's activities. More specifically, we consider the differential privacy of the mapping from the amount of flow for each origin-destination pair to the traffic flow measurements on each link of a traffic network. We use a stochastic online learning framework for the population dynamics, which is known to converge to the Nash equilibrium of the routing game. We analyze the sensitivity of this process and provide theoretical guarantees on the convergence rates as well as differential privacy values for these models. We confirm these with simulations on a small example. View details
    Accelerated Mirror Descent in Continuous and Discrete Time
    Alexandre Bayen
    Peter Bartlett
    Advances in neural information processing systems 28 (NIPS)(2015), pp. 2845-2853
    Preview abstract We study accelerated mirror descent dynamics in continuous and discrete time. Combining the original continuous-time motivation of mirror descent with a recent ODE interpretation of Nesterov's accelerated method, we propose a family of continuous-time descent dynamics for convex functions with Lipschitz gradients, such that the solution trajectories are guaranteed to converge to the optimum at a O(1/t^2) rate. We then show that a large family of first-order accelerated methods can be obtained as a discretization of the ODE, and these methods converge at a O(1/k^2) rate. This connection between accelerated mirror descent and the ODE provides an intuitive approach to the design and analysis of accelerated first-order algorithms. View details
    The Hedge Algorithm on a Continuum
    Maximilian Balandat
    Claire Tomlin
    Alexandre Bayen
    32nd International Conference on Machine Learning (ICML)(2015)
    Preview abstract We consider an online optimization problem on a subset S of R^n (not necessarily convex), in which a decision maker chooses, at each iteration t, a probability distribution x^(t) over S, and seeks to minimize a cumulative expected loss, where each loss is a Lipschitz function revealed at the end of iteration t. Building on previous work, we propose a generalized Hedge algorithm and show a O(\sqrtt \log t) bound on the regret when the losses are uniformly Lipschitz and S is uniformly fat (a weaker condition than convexity). Finally, we propose a generalization to the dual averaging method on the set of Lebesgue-continuous distributions over S. View details
    Efficient Bregman Projections onto the Simplex
    Syrine Krichene
    Alexandre Bayen
    IEEE 54th Annual Conference on Decision and Control (CDC)(2015)
    Preview abstract We consider the problem of projecting a vector onto the simplex, using a Bregman projection. This is a common problem in first-order methods for convex optimization and online-learning algorithms, such as mirror descent. We derive the KKT conditions of the projection problem, and show that for Bregman divergences induced by ω-potentials, one can efficiently compute the solution using a bisection method. More precisely, an epsilon-approximate projection can be obtained in O(d log 1/epsilon). We also consider a class of exponential potentials for which the exact solution can be computed efficiently, and give a O(d log d) deterministic algorithm and O(d) randomized algorithm to compute the projection. In particular, we show that one can generalize the KL divergence to a Bregman divergence which is bounded on the simplex (unlike the KL divergence), strongly convex with respect to the l1 norm, and for which one can still solve the projection in expected linear time. View details
    A PDE-ODE Model for a Junction with Ramp Buffer
    Maria-Laura Delle Monache
    Jack Reilly
    Samitha Samaranayake
    Paula Goatin
    Alexandre Bayen
    SIAM Journal on Applied Mathematics, 74(2014), pp. 22-39
    Preview abstract We consider the Lighthill--Whitham--Richards traffic flow model on a junction composed by one mainline, an onramp, and an offramp, which are connected by a node. The onramp dynamics is modeled using an ordinary differential equation describing the evolution of the queue length. The definition of the solution of the Riemann problem at the junction is based on an optimization problem and the use of a right-of-way parameter. The numerical approximation is carried out using a Godunov scheme, modified to take into account the effects of the onramp buffer. We present the result of some simulations and numerically check the convergence of the method. View details
    On the Convergence of No-Regret Learning in Selfish Routing
    Benjamin Drighès
    Alexandre Bayen
    31st International Conference on Machine Learning (ICML)(2014), pp. 163-171
    Preview abstract We study the repeated, non-atomic routing game, in which selfish players make a sequence of routing decisions. We consider a model in which players use regret-minimizing algorithms as the learning mechanism, and study the resulting dynamics. We are concerned in particular with the convergence to the set of Nash equilibria of the routing game. No-regret learning algorithms are known to guarantee convergence of a subsequence of population strategies. We are concerned with convergence of the actual sequence. We show that convergence holds for a large class of online learning algorithms, inspired from the continuous-time replicator dynamics. In particular, the discounted Hedge algorithm is proved to belong to this class, which guarantees its convergence. View details
    Stability of Nash Equilibria in the Congestion Game under Replicator Dynamics
    Benjamin Drighes
    Alexandre Bayen
    IEEE 53rd Annual Conference on Decision and Control (CDC)(2014)
    Preview abstract We consider the single commodity non-atomic congestion game, in which the player population is assumed to obey the replicator dynamics. We study the resulting rest points, and relate them to the Nash equilibria of the one-shot congestion game. The rest points of the replicator dynamics, also called evolutionary stable points, are known to coincide with a superset of Nash equilibria, called restricted equilibria. By studying the spectrum of the linearized system around rest points, we show that Nash equilibria are locally asymptotically stable stationary points. We also show that under the additional assumption of strictly increasing congestion functions, Nash equilibria are exactly the set of exponentially stable points. We illustrate these results on numerical examples. View details