# Rajesh Jayaram

Rajesh is a Research Scientist at Google NYC, where he is part of the NYC Algorithms and Optimization team. He received his PhD in Computer Science from Carnegie Mellon University in 2021, where he was advised by David P. Woodruff. Prior to that, in 2017 he received his B.S. in Mathematics and Computer Science from Brown University. His research focuses primarily on sublinear algorithms, especially randomized sketching and streaming algorithms for problems in big-data. In general, he enjoys thinking about dimensionality reduction: namely, to what extent can we compress the significant components of an enormous, noisy data-set? His work also spans property testing, machine learning, and optimization.

He also has a personal homepage.

He also has a personal homepage.

Authored Publications

Google Publications

Other Publications

Sort By

Optimal Fully Dynamic k-Center Clustering for Adaptive and Oblivious Adversaries

Preview
Monika Henzinger

Andreas Wiese

Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA)

Streaming Euclidean MST to a Constant Factor

Amit Levi

Erik Waingarten

Xi Chen

54rd Annual ACM Symposium on Theory of Computing (STOC'23) (2023)

Preview abstract
We study streaming algorithms for the fundamental geometric problem of computing the cost of the Euclidean Minimum Spanning Tree (MST) on an $n$-point set $X \subset \R^d$. In the streaming model, the points in $X$ can be added and removed arbitrarily, and the goal is to maintain an approximation in small space. In low dimensions, $(1+\epsilon)$ approximations are possible in sublinear space. However, for high dimensional space the best known approximation for this problem was $\tilde{O}(\log n)$, due to [Chen, Jayaram, Levi, Waingarten, STOC'22], improving on the prior $O(\log^2 n)$ bound due to [Andoni, Indyk, Krauthgamer, SODA '08]. In this paper, we break the logarithmic barrier, and give the first constant factor sublinear space approximation to Euclidean MST. For any $\epsilon\geq 1$, our algorithm achieves an $\tilde{O}(\epsilon^{-2})$ approximation in $n^{O(\epsilon)} d^{O(1)}$ space.
We complement this by demonstrating that any single pass algorithm which obtains a better than $1.10$-approximation must use $\Omega(\sqrt{n})$ space, demonstrating that $(1+\epsilon)$ approximations are not possible in high-dimensions, and that our algorithm is tight up to a constant. Nevertheless, we demonstrate that $(1+\epsilon)$ approximations are possible in sublinear space with $O(1/\epsilon)$ passes over the stream. More generally, for any $\alpha \geq 2$, we give a $\alpha$-pass streaming algorithm which achieves a $O(\frac{1}{ \alpha \epsilon})$ approximation in $n^{O(\epsilon)} d^{O(1)}$ space.
All our streaming algorithms are linear sketches, and therefore extend to the massively-parallel computation model (MPC). Thus, our results imply the first $(1+\epsilon)$-approximation in a constant number of rounds in the MPC model. Previously, such a result was only known for low-dimensional space ([Andoni, Nikolov, Onak, Yaroslavtsev, STOC'15]), or either required $O(\log n)$ rounds or suffered a $O(\log n)$ approximation.
View details

New Streaming Algorithms for High Dimensional EMD and MST

Amit Levi

Erik Waingarten

Xi Chen

To be submitted to ACM Symposium on Theory of Computing (STOC) (2022)

Preview abstract
We study streaming algorithms for two fundamental geometric problems: computing the cost of a Minimum Spanning Tree (MST) of an $n$-point set $X \subset \{1,2,\dots,\Delta\}^d$, and~computing the Earth Mover Distance (EMD) between two multi-sets $A,B \subset \{1,2,\dots,\Delta\}^d$ of size $n$. We~consider the turnstile model, where points can be added and removed. We give a one-pass streaming algorithm for MST and a two-pass streaming algorithm~for EMD, both achieving an approximation factor of $\tilde{O}(\log n)$ and using $\polylog(n,d,\Delta)$-space only. Furthermore, our algorithm for EMD can be compressed to a single pass using $\Delta d \cdot \polylog( n,d)$-space whenever ${|A \cap B|}/{|A \cup B|} \leq 1-\Omega(1)$. Previously, the best known sublinear-space streaming algorithms for either problem achieved an approximation of $O(\min\{ \log n , \log (\Delta d)\} \log n)$ \cite{AIK08, BDIRW20}. For MST, we also prove that any constant space streaming algorithm can only achieve an approximation of $\Omega(\log n)$, analogous to the $\Omega(\log n)$ lower bound for EMD of \cite{AIK08}.
Our algorithms are based on an improved analysis of a recursive space partitioning method known generically as the Quadtree. Specifically, we show that the Quadtree achieves an $\tilde{O}(\log n)$ approximation for both EMD and MST, improving on the $O(\min\{ \log n , \log (\Delta d)\} \log n)$ approximation of \cite{AIK08, BDIRW20}. %The main conceptual contribution is an analytical framework for studying the quadtree which goes beyond worst-case distortion of randomized tree embeddings.
View details

Stars: Tera-Scale Graph Building for Clustering and Learning

Warren Schudy

NeurIPS 2022 (2022)

Preview abstract
A fundamental procedure in the analysis of massive datasets is the construction of similarity graphs. Such graphs play a key role for many downstream tasks, including clustering, classification, graph learning, and nearest neighbor search. For these tasks, it is critical to build graphs which are sparse yet still representative of the underlying data. The benefits of sparsity are twofold: firstly, constructing dense graphs is infeasible in practice for large datasets, and secondly, the runtime of downstream tasks is directly controlled by the sparsity of the similarity graph. In this work, we present Stars: a highly scalable method for building extremely sparse graphs via two-hop spanners, which are graphs where similar points are connected by a path of length at most two. Stars can construct two-hop spanners with significantly fewer similarity comparisons, which are a major bottleneck for learning based models where comparisons are expensive to evaluate. Theoretically, we demonstrate that Stars builds a graph in nearly-linear time, where approximate nearest neighbors are contained within two-hop neighborhoods. To complement our results, we have deployed Stars for multiple data sets allowing for graph building at the Tera-Scale, i.e., for graphs with hundreds of billions of nodes and tens of trillions of edges. We evaluate the performance of Stars for clustering and graph learning, and demonstrate 10~1000-fold improvements in pairwise similarity comparisons and significant improvements in runtime with negligible quality loss.
View details

No Results Found