Yishay Mansour
Prof. Yishay Mansour got his PhD from MIT in 1990, following it he was
a postdoctoral fellow in Harvard and a Research Staff Member in IBM T.
J. Watson Research Center. Since 1992 he is at Tel-Aviv University,
where he is currently a Professor of Computer Science and has serves
as the first head of the Blavatnik School of Computer Science during
2000-2002. He was the founder and first director of the Israeli Center of Research
Excellence in Algorithms.
Prof. Mansour has published over 100 journal papers and over 200
proceeding papers in various areas of computer science with special
emphasis on machine learning, algorithmic game theory, communication
networks, and theoretical computer science and has supervised over a
dozen graduate students in those areas.
Prof. Mansour was named as an ACM fellow 2014, and he is currently an
associate editor in a number of distinguished journals and has been on
numerous conference program committees. He was both the program chair
of COLT (1998) and the STOC (2016) and served twice on the COLT
steering committee and is a member of the ALT steering committee.
Research Areas
Authored Publications
Sort By
Preview abstract
Simple, sufficient explanations
furnished by short decision lists
can be useful for guiding stakeholder actions.
Unfortunately, this transparency can come at the expense
of the higher accuracy enjoyed by black box methods,
like deep nets.
To date, practitioners typically either (i) insist on the simpler model, forsaking accuracy; or (ii) insist on maximizing accuracy, settling for post-hoc explanations of dubious faithfulness.
In this paper, we propose a hybrid \emph{partially interpretable model} that represents a compromise between the two extremes.
In our setup, each input is first processed by a decision list that can either execute a decision or abstain,
handing off authority to the opaque model.
The key to optimizing the decision list is to optimally
trade off the accuracy of the composite system
against coverage (the fraction of the population
that receives explanations).
We contribute a new principled algorithm for constructing partially interpretable decision lists,
providing theoretical guarantees
addressing both interpretability and accuracy.
As an instance of our result, we prove
that when the optimal decision list has length $k$, coverage $c$, and $b$ mistakes,
our algorithm will generate a decision list
that has length no greater than $4k$,
coverage at least $c/2$,
and makes at most $4b$ mistakes.
Finally, we empirically validate the effectiveness of the new model.
View details
Preview abstract
Principal-agent problems arise when one party acts on behalf of another, leading to conflicts of interest. The economic literature has extensively studied principal-agent problems, and recent work has extended this to more complex scenarios such as Markov Decision Processes (MDPs). In this paper, we further explore this line of research by investigating how reward shaping under budget constraints can improve the principal's utility. We study a two-player Stackelberg game where the principal and the agent have different reward functions, and the agent chooses an MDP policy for both players. The principal offers an additional reward to the agent, and the agent picks their policy selfishly to maximize their reward, which is the sum of the original and the offered reward. Our results establish the NP-hardness of the problem and offer polynomial approximation algorithms for two classes of instances: Stochastic trees and deterministic decision processes with a finite horizon.
View details
Preview abstract
We present the OMG-CMDP! algorithm for regret minimization in adversarial Contextual MDPs. The algorithm operates under the minimal assumptions of realizable function class and access to online least squares and log loss regression oracles. Our algorithm is efficient (assuming efficient online regression oracles), simple and robust to approximation errors. It enjoys an
$\widetilde{O}(H^{2.5} \sqrt{ T|S||A| (
} \linebreak[1]
\overline{ \mathcal{R}(\mathcal{O})
+ H \log(\delta^{-1}) )})$ regret guarantee,
with $T$ being the number of episodes, $S$ the state space, $A$ the action space, $H$ the horizon and $\mathcal{R}(\mathcal{O}) = \mathcal{R}(\mathcal{O}_{\mathrm{sq}}^\mathcal{F}) + \mathcal{R}(\mathcal{O}_{\mathrm{log}}^\mathcal{P})$
is the sum of the regression oracles' regret, used to approximate the context-dependent rewards and dynamics, respectively. To the best of our knowledge, our algorithm is the first efficient and rate optimal regret minimization algorithm for adversarial CMDPs which operates under the minimal and standard assumption of online function approximation. Our technique relies on standard convex optimization algorithms, and we show that it is robust to approximation errors.
View details
Preview abstract
Uniswap v3 is a decentralized exchange (DEX) that allows liquidity providers to allocate funds more efficiently by specifying an active price interval for their funds. This introduces the problem of finding an optimal strategy for choosing price intervals. We formalize this problem as an online learning problem with non-stochastic rewards. We use regret-minimization methods to show a Liquidity Provision strategy that guarantees a lower bound on the reward. This is true even for adversarial changes to asset pricing, and we express this bound in terms of the trading volume.
View details
Preview abstract
An abundance of recent impossibility results establish that regret minimization in Markov games with adversarial opponents is both statistically and computationally intractable.
Nevertheless, none of these results preclude the possibility of regret minimization under the assumption that all parties adopt the same learning procedure.
In this work, we present the first (to our knowledge) algorithm for learning in general-sum Markov games that provides sublinear regret guarantees when executed by all agents.
The bounds we obtain are for \emph{swap regret},
and thus, along the way, imply convergence to a \emph{correlated} equilibrium.
Our algorithm is decentralized, computationally efficient, and does not require any communication between agents.
Our key observation is that
online learning via policy optimization in Markov games essentially reduces to a form of \emph{weighted} regret minimization, with \emph{unknown} weights determined by the path length of the agents' policy sequence. Consequently, controlling the path length leads to weighted regret objectives for which sufficiently adaptive algorithms provide sublinear regret guarantees.
View details
Preview abstract
A landmark negative result of Long and Servedio established a worst-case spectacular failure of a supervised learning trio (loss, algorithm, model) otherwise praised for its high precision machinery. Hundreds of papers followed up on the two suspected culprits: the loss (for being convex) and/or the algorithm (for fitting a classical boosting blueprint). Here, we call to the half-century+ founding theory of losses for class probability estimation (properness), an extension of Long and Servedio's results and a new general boosting algorithm to demonstrate that the real culprit in their specific context was in fact the (linear) model class. We advocate for a more general stanpoint on the problem as we argue that the source of the negative result lies in the dark side of a pervasive -- and otherwise prized -- aspect of ML: \textit{parameterisation}.
View details
Preview abstract
Blackwell's celebrated theory measures approachability using the $\ell_2$ (Euclidean) distance. In many applications such as regret minimization, it is often more useful to study approachability under other distance metrics, most commonly the $\ell_\infty$ metric. However, the time and space complexity of the algorithms designed for $\ell_\infty$ approachability depend on the dimension of the space of the vectorial payoffs, which is often prohibitively large. We present a framework for converting high-dimensional $\ell_\infty$ approachability problems to low-dimensional \emph{pseudonorm} approachability problems, thereby resolving such issues. We first show that the $\ell_\infty$ distance between the average payoff and the approachability set can be equivalently defined as a \emph{pseudodistance} between a lower-dimensional average vector payoff and a new convex convex set we define. Next, we develop an algorithmic theory of pseudonorm approachability analogous to previous work norm approachability showing that it can be achieved via online linear optimization (OLO) over a convex set given by the Fenchel dual of the unit pseudonorm ball. We then use that to show, modulo mild normalization assumptions, that there exists an $\ell_\infty$ approachability algorithm whose convergence is independent of the dimension of the original vector payoff. We further show that that algorithm admits a polynomial-time complexity, assuming that the original $\ell_\infty$-distance can be computed efficiently. We also give an $\ell_\infty$ approachability algorithm whose convergence is logarithmic in that dimension using an FTRL algorithm with a maximum-entropy regularizer. Finally, we illustrate the benefits of our framework by applying it to several problems in regret minimization.
View details
Preview abstract
We study a generalization of boosting to the multiclass setting.
We introduce a weak learning condition for multiclass classification that captures the original notion of weak learnability as being “slightly better than random guessing”. We give a simple and efficient boosting algorithm, that does not require realizability assumptions and its sample and oracle complexity bounds are independent of the number of classes. Furthermore, we utilize our new boosting technique in two fundamental settings: multiclass PAC learning and List PAC learning, resulting in simplified algorithms compared to previous works.
View details
Preview abstract
We consider a seller faced with buyers which have the ability to delay their decision, which we call patience. Each buyer's type is composed of value and patience, and it is sampled i.i.d. from a distribution. The seller, using posted prices, would like to maximize her revenue from selling to the buyer. In this paper, we formalize this setting and characterize the resulting Stackelberg equilibrium, where the seller first commits to her strategy, and then the buyers best respond. Following this, we show how to compute both the optimal pure and mixed strategies. We then consider a learning setting, where the seller does not have access to the distribution over buyer's types. Our main results are the following. We derive a sample complexity bound for the learning of an approximate optimal pure strategy, by computing the fat-shattering dimension of this setting. Moreover, we provide a general sample complexity bound for the approximate optimal mixed strategy. We also consider an online setting and derive a vanishing regret bound with respect to both the optimal pure strategy and the optimal mixed strategy.
View details
Preview abstract
We study reinforcement learning with linear function approximation and adversarially changing cost functions, a setup that has mostly been considered under simplifying assumptions such as full information feedback or exploratory conditions.
We present a computationally efficient policy optimization algorithm for the challenging general setting of unknown dynamics and bandit feedback, featuring a combination of mirror-descent and least squares policy evaluation in an auxiliary MDP used to compute exploration bonuses.
Our algorithm obtains a $\widetilde O(K^{6/7})$ regret bound, improving significantly over previous state-of-the-art of $\widetilde O (K^{14/15})$ in this setting.
In addition, we present a version of the same algorithm under the assumption a simulator of the environment is available to the learner (but otherwise no exploratory assumptions are made), and prove it obtains state-of-the-art regret of $\widetilde O (K^{2/3})$.
View details