# Mohammadhossein Bateni

MohammadHossein Bateni is a staff research scientist at Google, where he is a member of the NYC Algorithms and Optimization Team. He obtained his Ph.D. and M.A. in Computer Science from Princeton University in 2011 and 2008, respectively, after finishing his undergraduate studies with a B.Sc. in Computer Engineering at Sharif University of Technology in 2006.
Hossein is broadly interested in combinatorics and combinatorial optimization. His research focuses on approximation algorithms, distributed computing, and analysis of game-theoretic models.

Authored Publications

Sort By

Optimal Fully Dynamic k-Center Clustering for Adaptive and Oblivious Adversaries

Preview
Monika Henzinger

Andreas Wiese

Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA)

Sequential Attention for Feature Selection

Taisuke Yasuda

Lin Chen

Proceedings of the 11th International Conference on Learning Representations (2023)

Preview abstract
Feature selection is the problem of selecting a subset of features for a machine learning model that maximizes model quality subject to a budget constraint. For neural networks, prior methods, including those based on L1 regularization, attention, and other techniques, typically select the entire feature subset in one evaluation round, ignoring the residual value of features during selection, i.e., the marginal contribution of a feature given that other features have already been selected. We propose a feature selection algorithm called Sequential Attention that achieves state-of-the-art empirical results for neural networks. This algorithm is based on an efficient one-pass implementation of greedy forward selection and uses attention weights at each step as a proxy for feature importance. We give theoretical insights into our algorithm for linear regression by showing that an adaptation to this setting is equivalent to the classical Orthogonal Matching Pursuit (OMP) algorithm, and thus inherits all of its provable guarantees. Our theoretical and empirical analyses offer new explanations towards the effectiveness of attention and its connections to overparameterization, which may be of independent interest.
View details

Tackling Provably Hard Representative Selection via Graph Neural Networks

Transactions on Machine Learning Research (2023)

Preview abstract
Representative Selection (RS) is the problem of finding a small subset of exemplars from a dataset that is representative of the dataset. In this paper, we study RS for attributed graphs, and focus on finding representative nodes that optimize the accuracy of a model trained on the selected representatives. Theoretically, we establish a new hardness result for RS (in the absence of a graph structure) by proving that a particular, highly practical variant of it (RS for Learning) is hard to approximate in polynomial time within any reasonable factor, which implies a significant potential gap between the optimum solution of widely-used surrogate functions and the actual accuracy of the model. We then study the setting where a (homophilous) graph structure is available, or can be constructed, between the data points. We show that with an appropriate modeling approach, the presence of such a structure can turn a hard RS (for learning) problem into one that can be effectively solved. To this end, we develop RS-GNN, a representation learning-based RS model based on Graph Neural Networks. Empirically, we demonstrate the effectiveness of RS-GNN on problems with predefined graph structures as well as problems with graphs induced from node feature similarities, by showing that RS-GNN achieves significant improvements over established baselines on a suite of eight benchmarks.
View details

Beating Approximation Factor 2 for Minimum k-way Cut in Planar and Minor-free Graphs

Alireza Farhadi

MohammadTaghi Hajiaghayi

Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), SIAM (2019), pp. 1055-1068

Preview abstract
The k-cut problem asks, given a connected graph G and a positive integer k, to find a minimum-weight set of edges whose removal splits G into k connected components. We give the first polynomial-time algorithm with approximation factor 2−ε (with constant ε>0) for the k-cut problem in planar and minor-free graphs. Applying more complex techniques, we further improve our method and give a polynomial-time approximation scheme for the k-cut problem in both planar and minor-free graphs. Despite persistent effort, to the best of our knowledge, this is the first improvement for the k-cut problem over standard approximation factor of 2 in any major class of graphs.
View details

Coresets Meet EDCS: Algorithms for Matching and Vertex Cover on Massive Graphs

Aaron Bernstein

Cliff Stein

Sepehr Assadi

Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), SIAM (2019), pp. 1616-1635

Preview abstract
Maximum matching and minimum vertex cover are among the most fundamental graph
optimization problems. Recently, randomized composable coresets were introduced as an effective
technique for solving these problems in various models of computation on massive graphs. In this
technique, one partitions the edges of an input graph randomly into multiple pieces, compresses
each piece into a smaller subgraph, namely a coreset, and solves the problem on the union of these
coresets to find the final solution. By designing small size randomized composable coresets, one
can obtain efficient algorithms, in a black-box way, in multiple computational models including
streaming, distributed communication, and the massively parallel computation (MPC) model.
We develop randomized composable coresets of size Oe(n) that for any constant ε > 0, give a
(3/2 + ε)-approximation to matching and a (3 + ε)-approximation to vertex cover. Our coresets
improve upon the previously best approximation ratio of O(1) for matching and O(log n) for
vertex cover. Most notably, our result for matching goes beyond a 2-approximation, which is
a natural barrier for maximum matching in many models of computation. Our coresets lead
to improved algorithms for the simultaneous communication model with randomly partitioned
input, the streaming model when the input arrives in a random order, and the MPC model with
O~(n√n) memory per machine and only two MPC rounds.
Furthermore, inspired by the recent work of Czumaj et al. (arXiv 2017), we study algorithms
for matching and vertex cover in the MPC model with only Oe(n) memory per machine. Building
on our coreset constructions, we develop parallel algorithms that give an O(1)-approximation
to both matching and vertex cover in only O(log log n) MPC rounds and O~(n) memory per
machine. We further improve the approximation ratio of our matching algorithm to (1 + ε) for
any constant ε > 0. Our results settle multiple open questions posed by Czumaj et al.
A key technical ingredient of our paper is a novel application of edge degree constrained
subgraphs (EDCS) that were previously introduced in the context of maintaining matchings in
dynamic graphs. At the heart of our proofs are new structural properties of EDCS that identify
these subgraphs as sparse certificates for large matchings and small vertex covers which are
quite robust to sampling and composition.
View details

Categorical Feature Compression via Submodular Optimization

Lin Chen

International Conference on Machine Learning (2019), pp. 515-523

Preview abstract
In modern machine learning tasks, the presence of categorical features with extremely large vocabularies is a reality. This becomes a bottleneck when using an ML model, which generally grows at least linearly with the vocabulary size, affecting the memory, training and inference costs, as well as overfitting risk. In this work, we seek to compress the vocabulary by maximizing the mutual information between the compressed categorical feature and the target binary labels. We note the relationship of this problem to that of quantization in a discrete memoryless channel, where there exists a quadratic-time algorithm to solve the problem. Unfortunately, such an algorithm does not scale to data sets with massive vocabularies and, in this paper, we develop a distributed quasi-linear O(n log n) algorithm with provable approximation guarantees. We first observe that although entropy is a submodular function, this is not the case for mutual information between a categorical feature and label. To tackle this problem, we define a set function over a different space, which still contains the optimal solution, and prove this function is submodular. We also provide a query oracle to the submodular function that runs in amortized logarithmic time, and is easy to compute in a distributed fashion. Combining these results with a greedy algorithm allows us to achieve a (1-1/e)-approximation in quasi-linear time. Finally, we compare our proposed algorithm to several existing approaches using the large-scale Criteo learning task and demonstrate better performance in retaining mutual information and also verify the learning performance of the compressed vocabulary.
View details

Cache-aware load balancing of data center applications

Aaron Schild

Ray Yang

Richard Zhuang

Proceedings of the VLDB Endowment, 12 (2019), pp. 709-723

Preview abstract
Our deployment of cache-aware load balancing in the Google web search backend reduced cache misses by ~0.5x, contributing to a double-digit percentage increase in the throughput of our serving clusters by relieving a bottleneck. This innovation has benefited all production workloads since 2015, serving billions of queries daily.
A load balancer forwards each query to one of several identical serving replicas. The replica pulls each term's postings list into RAM from flash, either locally or over the network. Flash bandwidth is a critical bottleneck, motivating an application-directed RAM cache on each replica. Sending the same term reliably to the same replica would increase the chance it hits cache, and avoid polluting the other replicas' caches. However, most queries contain multiple terms and we have to send the whole query to one replica, so it is not possible to achieve a perfect partitioning of terms to replicas.
We solve this via a voting scheme, whereby the load balancer conducts a weighted vote by the terms in each query, and sends the query to the winning replica. We develop a multi-stage scalable algorithm to learn these weights. We first construct a large-scale term-query graph from logs and apply a distributed balanced graph partitioning algorithm to cluster each term to a preferred replica. This yields a good but simplistic initial voting table, which we then iteratively refine via cache simulation to capture feedback effects.
View details

Optimal Distributed Submodular Optimization via Sketching

Hossein Esfandiari

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2018), pp. 1138-1147

Preview abstract
As an important special case of submodular optimization problems, coverage problems are a central problem in optimization with a wide range of applications in data mining and machine learning. As
we need to handle larger and larger data sets, there is a clear need to develop distributed solutions to
these problems. While several results have been developed for distributed coverage maximizations, all
the existing method have notable limitations, e.g., they all achieve either suboptimal approximation
guarantees or suboptimal space and memory complexities. Moreover, most previous results for submodular maximization either explicitly or implicitly assume that one has a value oracle access to
the submodular function. Such a value oracle for coverage functions has the following form: given a subfamily of (input) subsets, determine the size of the union of the subsets in this subfamily.
View details