# Morteza Zadimoghaddam

I am a staff research scientist (L6) at Google Cambridge office in the US. Prior to moving to Cambridge, I spent two years (2018 and 2019) in the Zurich office and 4 years in the New York office where I started my career at Google in January 2014. Prior to Google, I did my PhD in computer science at MIT (CSAIL) under supervision of Professor Erik D. Demaine.
I work on applying optimization techniques to various practical problems in order to find provably efficient algorithms. In particular, I apply infrastructure optimization methods to save computational resources at scale. On the mathematical and research side, I am interested in Submodular Optimization and its applications in large scale data mining and machine learning problems.

Authored Publications

Google Publications

Other Publications

Sort By

Deletion Robust Submodular Maximization over Matroids

Ashkan Norouzi Fard

Federico Fusco

Paul Duetting

ICML'22(2022)

Preview abstract
Maximizing a monotone submodular function is a fundamental task in machine learning. In this paper we study the deletion robust version of the problem under the classic matroids constraint. Here the goal is to extract a small size summary of the dataset that contains a high value independent set even after an adversary deleted some elements. We present constant-factor approximation algorithms, whose space complexity depends on the rank $k$ of the matroid and the number $d$ of deleted elements. In the centralized setting we present a $(3.582+O(\eps))$-approximation algorithm with summary size $O(k + \frac{d \log k}{\eps^2})$. In the streaming setting we provide a $(5.582+O(\eps))$-approximation algorithm with summary size and memory $O(k + \frac{d \log k}{\eps^2})$. We complement our theoretical results with an in-depth experimental analysis showing the effectiveness of our algorithms on real-world datasets.
View details

Edge-Weighted Online Bipartite Matching

Runzhou Tao

Zhiyi Huang

Journal of the ACM, 69(2022), 45:1-45:35

Preview abstract
Online bipartite matching is one of the most fundamental problems in the online algorithms literature. Karp, Vazirani, and Vazirani (STOC 1990) introduced an elegant algorithm for the unweighted problem that achieves an optimal competitive ratio of 1 - 1/e. Aggarwal et al. (SODA 2011) later generalized their algorithm and analysis to the vertex-weighted case. Little is known, however, about the most general edge-weighted problem aside from the trivial 1/2-competitive greedy algorithm. In this paper, we present the first online algorithm that breaks the long standing 1/2 barrier and achieves a competitive ratio of at least 0.5086. In light of the hardness result of Kapralov, Post, and Vondrák (SODA 2013) that restricts beating a 1/2 competitive ratio for the more general problem of monotone submodular welfare maximization, our result can be seen as strong evidence that edge-weighted bipartite matching is strictly easier than submodular welfare maximization in the online setting.
The main ingredient in our online matching algorithm is a novel subroutine called online correlated selection (OCS), which takes a sequence of pairs of vertices as input and selects one vertex from each pair. Instead of using a fresh random bit to choose a vertex from each pair, the OCS negatively correlates decisions across different pairs and provides a quantitative measure on the level of correlation. We believe our OCS technique is of independent interest and will find further applications in other online optimization problems.
View details

Distributed load balancing: a new framework and improved guarantees

Allen Liu

Binghui Peng

Innovations in Theoretical Computer Science(2021)

Preview abstract
Inspired by applications on search engines and web servers, we consider a load balancing problem with a general \textit{convex} objective function. In this problem, we are given a bipartite graph on a set of sources $S$ and a set of workers $W$ and the goal is to distribute the load from each source among its neighboring workers such that the total load of workers are as balanced as possible.
We present a new distributed algorithm that works with \textit{any} symmetric non-decreasing convex function for evaluating the balancedness of the workers' load.
Our algorithm computes a nearly optimal allocation of loads in $O(\log n \log^2 d/\eps^3)$ rounds where $n$ is the number of nodes, $d$ is the maximum degree, and $\eps$ is the desired precision. If the objective is to minimize the maximum load, we modify the algorithm to obtain a nearly optimal solution in $O(\log n \log d/\eps^2)$ rounds. This improves a line of algorithms that require a polynomial number of rounds in $n$ and $d$ and appear to encounter a fundamental barrier that prevents them from obtaining poly-logarithmic runtime
\cite{berenbrink2005dynamic, berenbrink2009new, subramanian1994analysis, rabani1998local}. In our paper, we introduce a novel primal-dual approach with multiplicative weight updates that allows us to circumvent this barrier. Our algorithm is inspired by \cite{agrawal2018proportional} and other distributed algorithms for optimizing linear objectives but introduces several new twists to deal with general convex objectives.
View details

Sliding Window Algorithms for k-Clustering Problems

Michele Borassi

Neurips 2020 (to appear)

Preview abstract
The sliding window model of computation captures scenarios in which data is arriving continuously, but only the latest $w$ elements should be used for analysis. The goal is to design algorithms that update the solution efficiently with each arrival rather than recomputing it from scratch. In this work, we focus on $k$-clustering problems such as $k$-means and $k$-median. In this setting, we give simple and practical algorithms that come with stronger performance guarantees than previously known results. Empirically, we show that our methods store only a small fraction of the data, are orders of magnitude faster, and find solutions with cost only slightly worse than those returned by algorithms that have access to the full dataset.
View details

Fully Dynamic Algorithm for Constrained Submodular Optimization

Ashkan Norouzi Fard

Jakub Tarnawski

Slobodan Mitrović

NeurIPS 2020 (to appear)

Preview abstract
The task of maximizing a monotone submodular function under a cardinality constraint is at the core of many machine learning and data mining applications, including data summarization, sparse regression and coverage problems. We study this problem in the context of fully dynamic streams, where elements can be both inserted and removed.
Our main result is a randomized algorithm that maintains an efficient data structure with a poly-logarithmic ammortized update time and returns a (1/2 - \epsilon)-approximate solution.
We complement our theoretical analysis with an empirical study of the performance of our algorithm.
View details

Scalable diversity maximization via small-size composable core-sets

31st ACM Symposium on Parallelism in Algorithms and Architectures (SPAA)(2019)

Preview abstract
Maximizing diversity is a central problem in information re-
trieval and data mining systems with prominent applications
in web search, recommender systems, news aggregators and
product search. In this paper, we study a diversity maxi-
mization problem (a.k.a. maximum dispersion problem) in
which given a set of n objects in a metric space, one wants to
find a subset of k objects with the maximum sum of pairwise
distances.
To solve this problem in a scalable distributed manner, we
apply a novel distributed framework for tackling large-scale
problems known as randomized composable core-sets: divide
the big data set into smaller parts, solve the problem for
each part, combine the solutions from each part, and solve
the problem on the union of these solutions. Our algorithms
improve significantly over the approximation guarantees of
state-of-the-art core-set-based algorithms while using min-
imum possible intermediate output size. In particular, we
present a simple distributed algorithm that achieves an al-
most optimal communication complexity, and moreover, it
asymptotically achieves approximation factor of 1/2 which
is the best possible approximation factor for the global opti-
mization problem under certain complexity theory assump-
tions.
Our algorithms are scalable and practical as shown by
our extensive empirical evaluation with large datasets and
they can be easily adapted to the major distributed comput-
ing systems like MapReduce. Furthermore, we show empir-
ically that, in real-life instances, our algorithms reach close-
to-optimal solutions with approximation factor of > 90%.
This approximation factor is far exceeding the approxima-
tion barrier for the problem and provide useful output.
View details

Better Sliding Window Algorithms to Maximize Subadditive and Diversity Objectives

Michele Borassi

Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS 2019)

Preview abstract
The streaming computation model is a standard model for large-scale data analysis: the input arrives one element at a time, and the goal is to maintain an approximately optimal solution using only a constant, or, at worst, poly-logarithmic space.
In practice, however, recency plays a large role, and one often wishes to consider only the last w elements that have arrived, the so-called sliding window problem. A trivial approach is to simply store the last w elements in a buffer; our goal is to develop algorithms with space and update time sublinear in w. In this regime, there are two frameworks: exponential histograms and smooth histograms, which can be used to obtain sliding window algorithms for families of functions satisfying certain properties.
Unfortunately, these frameworks have limitations and cannot always be applied directly. A prominent example is the case of a submodular function with cardinality constraints. Some of these difficulties can be rectified, but often only on a case-by-case basis. Here, we describe an alternative approach to designing efficient sliding window algorithms for maximization problems. Then we instantiate this approach on a wide range of problems, yielding better algorithms for submodular function optimization, diversity optimization and general subadditive optimization. In doing so, we improve state-of-the art results obtained using problem-specific algorithms.
View details

Non-monotone Submodular Maximization with Nearly Optimal Adaptivity and Query Complexity

Proceedings of the 36th International Conference on Machine Learning(2019), pp. 1833-1842

Preview abstract
Submodular maximization is a general optimization problem with a wide range of applications in machine learning (e.g., active learning, clustering, and feature selection). In large-scale optimization, the parallel running time of an algorithm is governed by its adaptivity, which measures the number of sequential rounds needed if the algorithm can execute polynomially-many independent oracle queries in parallel. While low adaptivity is ideal, it is not sufficient for an algorithm to be efficient in practice—there are many applications of distributed submodular optimization where the number of function evaluations becomes prohibitively expensive. Motivated by these applications, we study the adaptivity and query complexity of submodular maximization. In this paper, we give the first constant-factor approximation algorithm for maximizing a nonmonotone submodular function subject to a cardinality constraint k that runs in O(log(n)) adaptive rounds and makes O(n log(k)) oracle queries in expectation. In our empirical study, we use three real-world applications to compare our algorithm with several benchmarks for non-monotone submodular maximization. The results demonstrate that our algorithm finds competitive solutions using significantly fewer rounds and queries.
View details

Submodular Maximization with Nearly Optimal Approximation, Adaptivity and Query Complexity

Proceedings of the 30th Annual ACM-SIAM Symposium on Discrete Algorithms(2019), pp. 255-273

Preview abstract
Submodular optimization generalizes many classic problems in combinatorial optimization and has recently found a wide range of applications in machine learning (e.g., feature engineering and active learning). For many large-scale optimization problems, we are often concerned with the adaptivity complexity of an algorithm, which quantifies the number of sequential rounds where polynomially-many independent function evaluations can be executed in parallel. While low adaptivity is ideal, it is not sufficient for a distributed algorithm to be efficient, since in many practical applications of submodular optimization the number of function evaluations becomes prohibitively expensive. Motivated by these applications, we study the adaptivity and query complexity of adaptive submodular optimization.
Our main result is a distributed algorithm for maximizing a monotone submodular function with cardinality constraint k that achieves a (1 − 1/e − ε)-approximation in expectation. This algorithm runs in O(log(n)) adaptive rounds and makes O(n) calls to the function evaluation oracle in expectation. The approximation guarantee and query complexity are optimal, and the adaptivity is nearly optimal. Moreover, the number of queries is substantially less than in previous works. We also extend our results to the submodular cover problem to demonstrate the generality of our algorithm and techniques.
View details

Scalable Deletion-Robust Submodular Maximization: Data Summarization with Privacy and Fairness Constraints

Amin Karbasi

Ehsan Kazemi

Thirty-fifth International Conference on Machine Learning, ICML 2018

Preview abstract
Can we efficiently extract useful information from a large user-generated dataset while protecting the privacy of the users and/or ensuring fairness in representation. We cast this problem as an instance of a deletion-robust submodular maximization where part of the data may be deleted due to privacy concerns or fairness criteria. We propose the first memory-efficient centralized, streaming, and distributed methods with constant-factor approximation guarantees against \textit{any} number of adversarial deletions. We extensively evaluate the performance of our algorithms against prior state-of-the-art on real-world applications, including (i) Uber-pick up locations with location privacy constraints; (ii) feature selection with fairness constraints for income prediction and crime rate prediction; and (iii) robust to deletion summarization of census data, consisting of 2,458,285 feature vectors.
View details

Consistent Hashing with Bounded Loads

Mikkel Thorup

Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms(2018), pp. 587-604

Preview abstract
Designing algorithms for balanced allocation of clients to servers in dynamic settings is a challenging problem for a variety of reasons. Both servers and clients may be added and/or removed from the system periodically, and the main objectives of allocation algorithms are: the uniformity of the allocation, and the number of moves after adding or removing a server or a client. The most popular solution for our dynamic settings is Consistent Hashing. However, the load balancing of consistent hashing is no better than a random assignment of clients to servers, so with n of each, we expect many servers to be overloaded with Θ(logn/loglogn) clients. In this paper, with n clients and n servers, we get a guaranteed max-load of 2 while only moving an expected constant number of clients for each update.
We take an arbitrary user specified balancing parameter c=1+ϵ>1. With m balls and n bins in the system, we want no load above ⌈cm/n⌉. Meanwhile we want to bound the expected number of balls that have to be moved when a ball or server is added or removed. Compared with general lower bounds without capacity constraints, we show that in our algorithm when a ball or bin is inserted or deleted, the expected number of balls that have to be moved is increased only by a multiplicative factor O(1ϵ2) for ϵ≤1 (Theorem 4) and by a factor 1+O(logcc) for ϵ≥1 (Theorem 3). Technically, the latter bound is the most challenging to prove. It implies that we for superconstant c only pay a negligible cost in extra moves. We also get the same bounds for the simpler problem where we instead of a user specified balancing parameter have a fixed bin capacity C for all bins.
View details

Proportion allocation: Simple, Distributed, and Diverse Matching with high Entropy

Shipra Agrawal

The 35th International Conference on Machine Learning (ICML 2018)

Preview abstract
Inspired by many application of bipartite matchings in online advertising and machine learning, we study a simple and natural iterative proportional allocation algorithm:
Maintain a priority score $\priority_a$ for each node $a\in\advertisers$ on one side of the bipartition, initialized as $\priority_a=1$. Iteratively allocate each node $i\in \impressions$ on the other side to eligible nodes in $\advertisers$ in proportion of their priority scores. After each round, for each node $a\in \advertisers$, decrease or increase the score $\priority_a$ based on whether it is over- or under- allocated. Our first result is that this simple, distributed algorithm converges to a $(1-\epsilon)$-approximate fractional $b$-matching solution in $O({\log n\over \epsilon^2} )$ rounds. We also extend the proportional allocation algorithm and convergence results to the maximum weighted matching problem, and show that the algorithm can be naturally tuned to produce maximum matching with {\em high entropy}. High entropy, in turn, implies additional desirable properties of this matching, e.g., it satisfies certain diversity and fairness (aka anonymity) properties that are desirable in a variety of applications in online advertising and machine learning.
View details

Data Summarization at Scale: A Two-Stage Submodular Approach

Amin Karbasi

Ehsan Kazemi

Marko Mitrovic

Proceedings of the 35th International Conference on Machine Learning, ICML 2018, PMLR

Preview abstract
The sheer scale of modern datasets has resulted in a dire need for summarization techniques that identify representative elements in a dataset, and then use them to construct a drastically smaller subset which still encodes a similar amount of information. Fortunately, the vast majority of data
summarization tasks satisfy an intuitive diminishing returns condition known as submodularity,
which allows us to find nearly-optimal solutions in linear time.
We focus on a two-stage submodular framework where the goal is to use some given training functions to reduce the ground set so that optimizing new functions (drawn from the same distribution) over the reduced set provides almost as much value as optimizing them over the entire ground set. In this paper, we develop the first streaming and distributed solutions to this problem. In addition to providing strong theoretical guarantees, we demonstrate both the utility and efficiency of our algorithms on real-world tasks including image summarization and ride-share optimization.
View details

ONLINE SUBMODULAR WELFARE MAXIMIZATION: GREEDY BEATS 1/2 IN RANDOM ORDER

SIAM Journal on Computing, 47(3)(2018), pp. 1056-1086

Preview abstract
In the submodular welfare maximization (SWM) problem, the input consists of a set of n items, each of which must be allocated to one of m agents. Each agent ell has a valuation function v_ell, where v_ell(S) denotes the welfare obtained by this agent if she receives the set of items S. The functions v_ell are all submodular; as is standard, we assume that they are monotone and v_ell(∅) = 0. The goal is to partition the items into m disjoint subsets S1, S2, . . . , Sm in order to maximize the social welfare, defined as \Sum_ell v_ell(S_ell). A simple greedy algorithm gives a 1/2-approximation to SWM in the offline setting, and this was the best known until Vondr´ak’s recent (1 − 1/e)-approximation algorithm. In this paper, we consider the online version of SWM. Here, items arrive one at a time in an online manner; when an item arrives, the algorithm must make an irrevocable decision about which agent to assign it to before seeing any subsequent items. This problem is motivated by applications to Internet advertising, where user ad impressions must be allocated to advertisers whose value is a submodular function of the set of users/impressions they receive. There are two natural models that differ in the order in which items arrive. In the fully adversarial setting, an adversary can construct an arbitrary/worst-case instance, as well as pick the order in which items arrive in order to minimize the algorithm’s performance. In this setting, the 1/2-competitive greedy algorithm is the best possible. To improve on this, one must weaken the adversary slightly: In the random order model, the adversary can construct a worst-case set of items and valuations but does not control the order in which the items arrive; instead, they are assumed to arrive in a random order. The random order model has been well studied for online SWM and various special cases, but the best known competitive ratio (even for several special cases) is 1/2 + 1/n, which is barely better than the ratio for the adversarial order. Obtaining a competitive ratio of 1/2 + Ω(1) for the random order model has been an important open problem for several years. We solve this open problem by demonstrating that the greedy algorithm has a competitive ratio of at least 0.505 for online SWM in the random order model. This is the first result showing a competitive ratio bounded above 1/2 in the random order model, even for special cases such as the weighted matching or budgeted allocation problem (without the so-called large capacity assumptions). For special cases of submodular functions including weighted matching, weighted coverage functions, and a broader class of “second-order supermodular” functions, we provide a different analysis that gives a competitive ratio of 0.51. We analyze the greedy algorithm using a factor-revealing linear program, bounding how the assignment of one item can decrease potential welfare from assigning future items. In addition to our new competitive ratios for online SWM, we make two further contributions: First, we define the classes of second-order modular, supermodular, and submodular functions, which are likely to be of independent interest in submodular optimization.
Second, we obtain an improved competitive ratio via a technique we refer to as gain linearizing,
which may be useful in other contexts: Essentially, we linearize the submodular function by dividing the gain of an optimal solution into gain from individual elements, compare the algorithm’s gain when it assigns an element to the optimal solution’s gain from the element, and, crucially, bound the extent to which assigning elements can affect the potential gain of other elements.
View details

Submodular Optimization Over Sliding Windows

Proceedings of the 26th International World Wide Web Conference, WWW(2017)

Preview abstract
Maximizing submodular functions under cardinality constraints lies at
the core of numerous data mining and machine learning applications,
including data diversification, data summarization, and coverage problems.
In this work, we study this question in the context of data
streams, where elements arrive one at a time, and we want to
design low-memory and fast update-time algorithms that maintain a good solution.
Specifically, we focus on the sliding window
model, where we are asked to maintain a solution that considers only
the last $W$ items.
In this context, we provide the first non-trivial algorithm that
maintains a provable approximation of the optimum using space sublinear in the size of the window.
In particular we give a $\nicefrac{1}{3} - \epsilon$ approximation algorithm that uses space polylogarithmic in
the spread of the values of the elements,
$\Spread$, and linear in the solution size $k$ for any constant $\epsilon > 0$. At the same
time, processing each element only requires a
polylogarithmic number of evaluations of the function itself. When a
better approximation is desired, we show a different algorithm that, at the cost of using more memory, provides a $\nicefrac{1}{2} - \epsilon$
approximation, and allows a tunable trade-off between average update time and space. This algorithm matches the best known approximation guarantees for submodular optimization in insertion-only streams, a less general formulation of the problem.
We demonstrate the efficacy of the algorithms on a number of real
world datasets, showing that their practical performance far exceeds
the theoretical bounds. The algorithms preserve high quality solutions in streams with millions of items, while storing a
negligible fraction of them.
View details

Bicriteria Distributed Submodular Maximization in a Few Rounds

29th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA)(2017)

Preview abstract
We study the problem of efficiently optimizing submodular functions under
cardinality constraints in distributed setting. Recently, several
distributed algorithms for this problem have been introduced
which either achieve a sub-optimal solution or they run in super-constant
number
of rounds of computation. Unlike previous work, we aim
to design distributed algorithms in multiple rounds
with almost optimal approximation guarantees
at the cost of outputting a larger number of elements.
Toward this goal, we present a distributed algorithm that, for any \epsilon >
0
and any constant r,
outputs a set
S of O(rk/\epsilon^{1\over r}) items in r rounds, and achieves
a (1-\epsilon)-approximation of the value of the optimum set with k items.
This is the first
distributed algorithm
that achieves an approximation factor of (1-\epsilon) running in less than
$\log {1\over \epsilon}$ number of rounds.
We also prove a hardness result showing that the output of any $1-\epsilon$ approximation distributed algorithm limited to one distributed round should have at least $\Omega(k/\epsilon)$ items. In light of this hardness result, our distributed algorithm in one round, $r=1$, is asymptotically tight in terms of the output size.
We support the theoretical guarantees
with an extensive empirical study of our
algorithm showing that achieving almost optimum solutions is indeed possible in
a few rounds for large-scale real datasets.
View details

Overcommitment in Cloud Services – Bin Packing with Chance Constraints

Maxime Cohen

Phil Keller

ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems 2017

Preview abstract
This paper considers the traditional problem of resource allocation, i.e., scheduling jobs on machines. One such recent application is cloud computing, where jobs arrive in an online fashion with capacity requirements and need to be allocated to physical machines. It is often observed that the requested capacities are not fully utilized, hence offering an opportunity to employ an overcommitment policy, i.e., selling resources beyond capacity. Setting the right overcommitment level can induce a significant cost reduction for the cloud provider, while taking a very low risk of violating capacity constraints. We introduce and study a model that quantifies the value of overcommitment by modeling the problem as a bin packing with chance constraints. We then propose an alternative formulation that transforms each chance constraint into a sub-modular function. We show that our model captures the risk pooling effect and allows to guide scheduling and overcommitment decisions. We also develop a family of online algorithms that are intuitive, easy to implement and provide a constant factor guarantee from optimal. Finally, we calibrate our model using realistic workload data and test our approach in a practical setting. Our analysis and experiments illustrate the benefit of overcommitting in cloud services.
View details

Scalable Feature Selection via Distributed Diversity Maximization

Sepehr Abbasi Zadeh

Mehrdad Ghadiri

AAAI(2017), pp. 2876-2883

Preview abstract
Feature selection is a fundamental problem in machine learning
and data mining. The majority of feature selection algorithms
are designed for running on a single machine (centralized
setting) and they are less applicable to very large
datasets. Although there are some distributed methods to
tackle this problem, most of them are distributing the data
horizontally which are not suitable for datasets with a large
number of features and few number of instances. Thus, in this
paper, we introduce a novel vertically distributable feature selection
method in order to speed up this process and be able
to handle very large datasets in a scalable manner. In general,
feature selection methods aim at selecting relevant and
non-redundant features (Minimum Redundancy and Maximum
Relevance). It is much harder to consider redundancy
in a vertically distributed setting than a centralized setting
since there is no global access to the whole data. To the best
of our knowledge, this is the first attempt toward solving the
feature selection problem with a vertically distributed filter
method which handles the redundancy with consistently comparable
results with centralized methods. In this paper, we
formalize the feature selection problem as a diversity maximization
problem by introducing a mutual-information-based
metric distance on the features. We show the effectiveness of
our method by performing an extensive empirical study. In
particular, we show that our distributed method outperforms
state-of-the-art centralized feature selection algorithms on a
variety of datasets. From a theoretical point of view, we have
proved that the used greedy algorithm in our method achieves
an approximation factor of 1/4 for the diversity maximization
problem in a distributed setting with high probability. Furthermore,
we improve this to 8/25 expected approximation
using multiplicity in our distribution.
View details

Bicriteria Distributed Submodular Maximization in a Few Rounds

Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures(2017)

Preview abstract
We study the problem of efficiently optimizing submodular functions under cardinality constraints in distributed setting. Recently, several distributed algorithms for this problem have been introduced which either achieve a sub-optimal solution or they run in super-constant number of rounds of computation. Unlike previous work, we aim to design distributed algorithms in multiple rounds with almost optimal approximation guarantees at the cost of outputting a larger number of elements. Toward this goal, we present a distributed algorithm that, for any ε > 0 and any constant r, outputs a set S of O(rk/ε^(1/r)) items in r rounds, and achieves a (1-ε)-approximation of the value of the optimum set with k items. This is the first distributed algorithm that achieves an approximation factor of (1-ε) running in less than log 1/ε number of rounds. We also prove a hardness result showing that the output of any 1-ε approximation distributed algorithm limited to one distributed round should have at least Ω(k/ε) items. In light of this hardness result, our distributed algorithm in one round, r = 1, is asymptotically tight in terms of the output size. We support the theoretical guarantees with an extensive empirical study of our algorithm showing that achieving almost optimum solutions is indeed possible in a few rounds for large-scale real datasets.
View details

Greedy Column Subset Selection: New Bounds and Distributed Algorithms

Aditya Bhaskara

Jason Altschuler

ICML(2016) (to appear)

Preview abstract
The problem of matrix column subset selection
has recently attracted a large body of research,
with feature selection serving as one obvious and
important application. Among the techniques
that have been applied to solve this problem, the
greedy algorithm has been shown to be quite
effective in practice. However, our theoretical
guarantees on its performance have not been ex-
plored thoroughly, especially in a distributed set-
ting. In this paper, we study the greedy algorithm
for the column subset selection problem from a
theoretical and empirical perspective and show
its effectiveness in a distributed setting. In par-
ticular, we provide an improved approximation
guarantee for the greedy algorithm, and present
the first distributed implementation of this algo-
rithm with provable approximation factors. We
use the idea of randomized composable core-
sets, developed recently in the context of sub-
modular maximization. Finally, we validate the
effectiveness of this distributed algorithm via an
empirical study.
View details

Horizontally Scalable Submodular Maximization

Andreas Krause

International Conference on Machine Learning(2016)

Preview abstract
A variety of large-scale machine learning problems can be cast as instances of constrained submodular maximization. Existing approaches for distributed submodular maximization have a critical drawback: The capacity – the number of instances that can fit in memory of each machine – must grow with the data set size. In practice, while one can provision many machines, the capacity of each machine is limited by physical constraints. We propose a truly scalable approach for distributed submodular maximization when the capacity of each machine is fixed. The proposed framework applies to a broad class of algorithms and a variety of constraints. We provide theoretical guarantees on the approximation factor for any available capacity. We empirically evaluate the proposed algorithm on a variety of data sets and demonstrate that the algorithm achieves performance competitive with the centralized greedy solution.
View details

Revenue Maximization with Nonexcludable Goods

Preview
Nima Haghpanah

Transactions on Economics and Computation(2015)

Submodular secretary problems with extensions

MohammadTaghi Hajiaghayi

ACM Transactions on Algorithms, 9 (4)(2013)

Preview abstract
Online auction is the essence of many modern markets, particularly networked markets, in which information about goods, agents, and outcomes is revealed over a period of time, and the agents must make irrevocable decisions without knowing future information. Optimal stopping theory, especially the classic secretary problem, is a powerful tool for analyzing such online scenarios which generally require optimizing an objective function over the input. The secretary problem and its generalization the multiple-choice secretary problem were under a thorough study in the literature. In this article, we consider a very general setting of the latter problem called the submodular secretary problem, in which the goal is to select k secretaries so as to maximize the expectation of a (not necessarily monotone) submodular function which deﬁnes efficiency of the selected secretarial group based on their overlapping skills. We present the ﬁrst constant-competitive algorithm for this case. In a more general setting in which selected secretaries should form an independent (feasible) set in each of $l$ given matroids as well, we obtain an O(l log^2 r)-competitive algorithm generalizing several previous results, where $r$ is the maximum rank of the matroids. Another generalization is to consider $l$ knapsack constraints (i.e., a knapsack constraint assigns a nonnegative cost to each secretary, and requires that the total cost of all the secretaries employed be no more than a budget value) instead of the matroid constraints, for which we present an O(l)-competitive algorithm. In a sharp contrast, we show for a more general setting of subadditive secretary problem, there is no o~(\sqrt n) competitive algorithm and thus submodular functions are the most general functions to consider for constant-competitiveness in our setting. We complement this result by giving a matching O(\sqrt n) competitive algorithm for the subadditive case. At the end, we consider some special cases of our general setting as well.
View details

Submodular secretary problems with extensions

Preview
MohammadTaghi Hajiaghayi

ACM Transactions on Algorithms, 9 (4)(2013)

Revenue Maximization with Nonexcludable Goods

Preview
Nima Haghpanah

Internet and Network Economics - 9th International Workshop, WINE 2013, Springer

Bicriteria Online Matching: Maximizing Weight and Cardinality

International Conference on Web and Internet Economics (WINE)(2013), pp. 305-318

Preview abstract
Inspired by online ad allocation problems,
many results have been developed for
online matching problems.
Most of the previous work deals with a single objective,
but, in practice, there is a need to optimize multiple objectives.
Here, as an illustrative example motivated by display ads allocation,
we study a bi-objective online matching problem.
In particular, we consider a set of fixed nodes (ads) with capacity
constraints, and a set of online items (pageviews) arriving one by
one. Upon arrival of an online item $i$, a set of eligible fixed
neighbors (ads) for the item is revealed, together with a weight
$w_{ia}$ for eligible neighbor $a$. The problem is to assign each item to
an eligible neighbor online, while respecting the capacity
constraints; the goal is to maximize both the total weight of the
matching and the cardinality.
In this paper, we present both approximation algorithms and
hardness results for this problem.
An $(\alpha, \beta)$-approximation for this problem is a matching with
weight at least $\alpha$ fraction of the maximum
weighted matching, and cardinality
at least $\beta$ fraction of maximum cardinality matching.
We present a parametrized approximation
algorithm that allows a smooth tradeoff curve between the two
objectives: when the capacities of fixed nodes are
large, we give a $p(1- 1/e^{1/p}), (1-p)(1-1/e^{1/1-p})$-approximation
for any $0 \le p \le 1$, and prove a `hardness curve' combining several
inapproximability
results. These upper and lower bounds are always close (with a maximum
gap of $9\%$), and exactly coincide at the point $(0.43, 0.43)$.
For small capacities, we present a
smooth parametrized approximation curve for the problem
between $(0,1-1/e)$ and $(1/2,0)$ passing
through a $(1/3,0.3698)$-approximation.
View details

Simultaneous Approximations for Adversarial and Stochastic Online Budgeted Allocation

Shayan Oveis Gharan

Symposium on Discrete Algorithms (SODA), ACM/SIAM(2012)

Preview abstract
Motivated by online ad allocation, we study the problem of simultaneous approximations for the adversarial and stochastic online budgeted allocation problem. This problem consists of a bipartite graph G = (X, Y, E), where the nodes of Y along with their corresponding capacities are known beforehand to the algorithm, and the nodes of X arrive online. When a node of X arrives, its incident edges, and their respective weights are revealed, and the algorithm can match it to a neighbor in Y. The objective is to maximize the weight of the final matching, while respecting the capacities.
When nodes arrive in an adversarial order, the best competitive ratio is known to be 1 - 1/e, and it can be achieved by the Ranking [18], and its generalizations (Balance [16, 21]). On the other hand, if the nodes arrive through a random permutation, it is possible to achieve a competitive ratio of 1 -- ε [9]. In this paper we design algorithms that achieve a competitive ratio better than 1 -- 1/e on average, while preserving a nearly optimal worst case competitive ratio. Ideally, we want to achieve the best of both worlds, i.e, to design an algorithm with the optimal competitive ratio in both the adversarial and random arrival models. We achieve this for unweighted graphs, but show that it is not possible for weighted graphs.
In particular, for unweighted graphs, under some mild assumptions, we show that Balance achieves a competitive ratio of 1 -- ε in a random permutation model. For weighted graphs, however, we prove this is not possible; we prove that no online algorithm that achieves an approximation factor of 1 -- 1/ε for the worst-case inputs may achieve an average approximation factor better than 97.6% for random inputs. In light of this hardness result, we aim to design algorithms with improved approximation ratios in the random arrival model while preserving the competitive ratio of 1 -- 1/ε in the worst case. To this end, we show the algorithm proposed by [21] achieves a competitive ratio of 0.76 for the random arrival model, while having a 1 -- 1/ε ratio in the worst case.
View details

Online Stochastic Weighted Matching: Improved Approximation Algorithms

Preview
Bernard Haeupler

Workshop of Network and Internet Economics (WINE) 2011

Permutation betting markets: singleton betting with extra information

Preview
Mohammad Ghodsi

Hamid Mahini

ACM Conference on Electronic Commerce(2008), pp. 180-189

No Results Found