Jump to Content
Soumya Basu

Soumya Basu

I am an SWE in Google. My research interest lies in theory and practice of Online Learning for enhancing complex systems. I am excited to utilize my expertise in online learning to improve performance across Google's large array of products! Before joining Google, I obtained my Ph.D. in ECE, UT Austin under the supervision of Prof. Evdokia Nikolova and Prof. Sanjay Shakkottai. I was a member of Wireless Networking and Communication Group (WNCG), UT Austin. Earlier during my undergraduate and Master's studies I have worked under Prof. Goutam Das in IIT Kharagpur.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Many modern high-performing machine learning models such as GPT-3 primarily rely on scaling up models, e.g., transformer networks. Simultaneously, a parallel line of work aims to improve the model performance by augmenting an input instance with other (labeled) instances during inference. Examples of such augmentations include task-specific prompts and similar examples retrieved from the training data by a nonparametric component. Remarkably, retrieval-based methods have enjoyed success on a wide range of problems, ranging from standard natural language processing and vision tasks to protein folding, as demonstrated by many recent efforts, including WebGPT and AlphaFold. Despite a growing literature showcasing the promise of these models, the theoretical underpinning for such models remains underexplored. In this paper, we present a formal treatment of retrieval-based models to characterize their generalization ability. In particular, we focus on two classes of retrieval-based classification approaches: First, we analyze a local learning framework that employs an explicit local empirical risk minimization based on retrieved examples for each input instance. Interestingly, we show that breaking down the underlying learning task into local sub-tasks enables the model to employ a low complexity parametric component to ensure good overall accuracy. The second class of retrieval-based approaches we explore learns a global model using kernel methods to directly map an input instance and retrieved examples to a prediction, without explicitly solving a local learning task. View details
    MmWave Codebook Selection in Rapidly-Varying Channels via Multinomial Thompson Sampling
    Robert W. Heath Jr.
    Sanjay Shakkottai
    Yi Zhang
    Proceedings of the Twenty-second International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing, ACM (2021)
    Preview abstract Millimeter-wave (mmWave) communications, using directional beams, is a key enabler for high-throughput mobile ad hoc networks. These directional beams are organized into multiple codebooks according to beam resolution, with each codebook consisting of a set of equal-width beams that cover the whole angular space. The code-book with narrow beams delivers high throughput, at the expense of scanning time. Therefore overall throughput maximization is achieved by selecting a mmWave codebook that balances between beamwidth (beamforming gain) and beam alignment overhead. Further, these codebooks have some potential natural structures such as the non-decreasing instantaneous rate or the unimodal throughput as one traverses from the codebook with wide beams to the one with narrow beams. We study the codebook selection problem through a multi-armed bandit (MAB) formulation in mmWave networks with rapidly-varying channels. We develop multiple novel Thompson Sampling-based algorithms for our setting given different codebook structures with theoretical guarantees on regret. We further collect real-world (60 GHz) measurements with 12-antenna phased arrays, and show the performance benefits of our approaches in an IEEE 802.11ad/ay emulation setting. View details
    Combinatorial Blocking Bandits with Stochastic Delays
    Alexia Atsidakou
    Constantine Caramanis
    Orestis Papadigenopoulas
    Sanjay Shakkottai
    ICML 2021 (2021)
    Preview abstract Recent work has considered natural variations of the {\em multi-armed bandit} problem, where the reward distribution of each arm is a special function of the time passed since its last pulling. In this direction, a simple (yet widely applicable) model is that of {\em blocking bandits}, where an arm becomes unavailable for a deterministic number of rounds after each play. In this work, we extend the above model in two directions: (i) We consider the general combinatorial setting where more than one arms can be played at each round, subject to feasibility constraints. (ii) We allow the blocking time of each arm to be stochastic. We first study the computational/unconditional hardness of the above setting and identify the necessary conditions for the problem to become tractable (even in an approximate sense). Based on these conditions, we provide a tight analysis of the approximation guarantee of a natural greedy heuristic that always plays the maximum expected reward feasible subset among the available (non-blocked) arms. When the arms' expected rewards are unknown, we adapt the above heuristic into a bandit algorithm, based on UCB, for which we provide sublinear (approximate) regret guarantees, matching the theoretical lower bounds in the limiting case of absence of delays. View details
    Beyond $\mathbflog^2(T)$ Regret for Decentralized Bandits in Matching Markets
    Abishek Sankararaman
    Karthik Abinav Sankararaman
    ICML 2021 (2021)
    Preview abstract We design decentralized algorithms for regret minimization in the two sided matching market with one-sided bandit feedback that significantly improves upon the prior works (Liu et al.\,2020a, 2020b, Sankararaman et al.\,2020). First, for general markets, for any $\varepsilon > 0$, we design an algorithm that achieves a $O(\log^{1+\varepsilon}(T))$ regret to the agent-optimal stable matching, with unknown time horizon $T$, improving upon the $O(\log^{2}(T))$ regret achieved in (Liu et al.\,2020b). Second, we provide the optimal $\Theta(\log(T))$ agent-optimal regret for markets satisfying {\em uniqueness consistency} -- markets where leaving participants don't alter the original stable matching. Previously, $\Theta(\log(T))$ regret was achievable (Sankararaman et al.\,2020, Liu et al.\,2020b) in the much restricted {\em serial dictatorship} setting, when all arms have the same preference over the agents. We propose a phase based algorithm, where in each phase, besides deleting the globally communicated dominated arms the agents locally delete arms with which they collide often. This \emph{local deletion} is pivotal in breaking deadlocks arising from rank heterogeneity of agents across arms. We further demonstrate superiority of our algorithm over existing works through simulations. View details
    Preview abstract We propose to use Thompson sampling to adapt to a sequence of bandit tasks that the algorithm interacts with in a sequential manner. Our algorithm, which we call AdaTS, achieves lower regret than the alternatives that do not adapt by maintaining a distribution over the parameters of the prior, which drives additional exploration. AdaTS is a natural fully-Bayesian algorithm, which can be implemented efficiently in a number of scenarios, as we demonstrate in this work. Our bounds on its Bayes regret show benefits of the extra layer of adaptivity. Our theoretical findings are also supported by experiments, where AdaTS outperforms prior algorithms and is applied to a challenging real-world problem. View details
    Contextual Blocking Bandits
    Constantine Caramanis
    Orestis Papadigenopoulas
    Sanjay Shakkottai
    International Conference on Artificial Intelligence and Statistics, PMLR (2021)
    Preview abstract We study a novel variant of the multi-armed bandit problem, where at each time step, the player observes an independently sampled context that determines the arms’ mean rewards. However, playing an arm blocks it (across all contexts) for a fixed and known number of future time steps. The above contextual setting, which captures important scenarios such as recommendation systems or ad placement with diverse users, invalidates greedy solution techniques that are effective for its non-contextual counterpart (Basu et al., NeurIPS19). Assuming knowledge of the context distribution and the mean reward of each arm-context pair, we cast the problem as an online bipartite matching problem, where the right-vertices (contexts) arrive stochastically and the left-vertices (arms) are blocked for a finite number of rounds each time they are matched. This problem has been recently studied in the full-information case, where competitive ratio bounds have been derived. We focus on the bandit setting, where the reward distributions are initially unknown; we propose a UCB-based variant of the full-information algorithm that guarantees a O(log T)-regret w.r.t. an α-optimal strategy in T time steps, matching the Ω(log(T)) regret lower bound in this setting. Due to the time correlations caused by blocking, existing techniques for upper bounding regret fail. For proving our regret bounds, we introduce the novel concepts of delayed exploitation and opportunistic sub-sampling and combine them with ideas from combinatorial bandits and non-stationary Markov chains coupling. View details
    No Results Found