Rina Panigrahy
I am broadly interested in theoretical and applied algorithms in areas such as similarity search, sketching and streaming, cuckoo hashing, learning and prediction, and large graph analysis,
Network algorithms. Much of my research has been related to problems arising in engineering applications such as provable guarantees for deep learning, distributed caching for content delivery networks that led to the founding of Akamai Technologies, risk-reward trade-off for stock prediction, space efficient hashing in networking equipment, low power Ternary-CAMs, space efficient similarity search, and fast distance estimation in social networks.
Authored Publications
Sort By
How Transformers Solve Propositional Logic Problems: A Mechanistic Analysis
Guanzhe Hong
Nishanth Dikkala
Enming Luo
The 4th Workshop on Mathematical Reasoning and AI @ NeurIPS 2024
Preview abstract
Large language models (LLMs) have shown amazing performance on tasks that require planning and reasoning. Motivated by this, we investigate the internal mechanisms that underpin a network's ability to perform complex logical reasoning. We first construct a synthetic propositional logic problem that serves as a concrete test-bed for network training and evaluation. Crucially, this problem demands nontrivial planning to solve, but we can train a small transformer to achieve perfect accuracy. Building on our set-up, we then pursue an understanding of precisely how a three-layer transformer, trained from scratch, solves this problem. We are able to identify certain "planning" and "reasoning" circuits in the network that necessitate cooperation between the attention blocks to implement the desired logic. To expand our findings, we then study a larger model, Mistral 7B. Using activation patching, we characterize internal components that are critical in solving our logic problem. Overall, our work systemically uncovers novel aspects of small and large transformers, and continues the study of how they plan and reason.
View details
Preview abstract
Deep and wide neural networks successfully fit very complex functions today, but dense models are starting to be prohibitively expensive for inference. To mitigate this, one promising direction is networks that activate a sparse subgraph of the network. The subgraph is chosen by a data-dependent routing function, enforcing a fixed mapping of inputs to subnetworks (e.g., the Mixture of Experts (MoE) paradigm in Switch Transformers). However, prior work is largely empirical, and while existing routing functions work well in practice, they do not lead to theoretical guarantees on approximation ability. We aim to provide a theoretical explanation for the power of sparse networks. As our first contribution, we present a formal model of data-dependent sparse networks that captures salient aspects of popular architectures. We then introduce a routing function based on locality sensitive hashing (LSH) that enables us to reason about how well sparse networks approximate target functions. After representing LSH-based sparse networks with our model, we prove that sparse networks can match the approximation power of dense networks on Lipschitz functions. Applying LSH on the input vectors means that the experts interpolate the target function in different subregions of the input space. To support our theory, we define various datasets based on Lipschitz target functions, and we show that sparse networks give a favorable trade-off between number of active units and approximation quality.
View details
One network fits all? Modular versus monolithic task formulations in neural networks
Abhimanyu Das
Atish Agarwala
Brendan Juba
Vatsal Sharan
ICLR 2021 (2021)
Preview abstract
Can deep learning solve multiple, very different tasks simultaneously? We investigate how the representations of the underlying tasks affect the ability of a single neural network to learn them jointly. We present theoretical and empirical findings that a single neural network is capable of simultaneously learning multiple tasks from a combined data set, for a variety of methods for representing tasks---for example, when the distinct tasks are represented by well-separated clusters or decision trees over some task-code attributes. Indeed, more strongly, we present a novel analysis that shows that families of simple programming-like constructs for the task codings are learnable by two-layer neural networks with standard training. We study more generally how the complexity of learning such combined tasks grows with the complexity of the task codes; we find that learning many tasks can be provably hard, even though the individual tasks are easy to learn. We provide empirical support for the usefulness of the learning bounds by training networks on clusters, decision trees, and SQL-style aggregation.
View details
Convergence Results for Neural Networks via Electrodynamics
Ali Rahimi
Sushant Sachdeva
Qiuyi Zhang:
Innovations in Theoretical Computer Science Conference, ITCS 2018 (2018)
Preview abstract
We study whether a depth two neural network can learn another depth two network using gradient descent. Assuming a linear output node, we show that the question of whether gradient descent converges to the target function is equivalent to the following question in electrodynamics: Given k fixed protons in R^d, and k electrons, each moving due to the attractive force from the protons and repulsive force from the remaining electrons, whether at equilibrium all the electrons will be matched up with the protons, up to a permutation. Under the standard electrical force, this follows from the classic Earnshaw's theorem. In our setting, the force is determined by the activation function and the input distribution. Building on this equivalence, we prove the existence of an activation function such that gradient descent learns at least one of the hidden nodes in the target network. Iterating, we show that gradient descent can be used to learn the entire network one node at a time.
View details
Algorithms for ℓp Low Rank Approximation
Flavio Chierichetti
David P. Woodruff
ICML '17 (2017)
Preview abstract
We consider the problem of approximating a given matrix by a low-rank matrix so as to minimize the entrywise ℓp-approximation error, for any p≥1; the case p=2 is the classical SVD problem. We obtain the first provably good approximation algorithms for this version of low-rank approximation that work for every value of p≥1, including p=∞. Our algorithms are simple, easy to implement, work well in practice, and illustrate interesting tradeoffs between the approximation quality, the running time, and the rank of the approximating matrix.
View details
Partitioning Orders in Online Shopping Services
Debmalya Panigrahi
Conf. on Information and Knowledge Management (CIKM) (2017)
Preview abstract
The rapid growth of the sharing economy has led to the widespread use of newer and richer models of online shopping and delivery services. The race to deliver fast has transformed such services into complex networks of shoppers,
stores, and consumers. Needless to say, the efficiency of the store order management is critical to the business.
Motivated by this setting, we consider the following problem: given a set of online shopping orders each consisting of a few items, how to best partition the orders among a given number of pickers? Owing to logistical constraints the orders are typically unsplittable in the partition. This partitioning, taking the physical location of the items in the store , has to optimize the utilization and amount of work done by the shoppers in the store. Formulating this as a combinatorial optimization problem, we propose a family of simple and efficient algorithms that admit natural constraints arising in this setting. In addition to showing provable guarantees for the algorithms, we also demonstrate their efficiency in practice on real-world data from Google Express [1], outperforming natural baselines.
View details