Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 10501 publications
    Preview abstract We consider the Coalition Structure Learning (CSL) problem in multi-agent systems, motivated by the existence of coalitions in many real-world systems, e.g., trading platforms and auction systems. In this problem, there is a hidden coalition structure within a set of $n$ agents, which affects the behavior of the agents in games. Our goal is to actively design a sequence of games for the agents to play, such that observations in these games can be used to learn the hidden coalition structure. In particular, we consider the setting where in each round, we design and present a game together with a strategy profile to the agents, and receive a multiple-bit observation -- for each agent, we observe whether or not they would like to deviate from the specified strategy in this given game. Our contributions are three-fold: First, we show that we can learn the coalition structure in $O(\log n)$ rounds if we are allowed to choose any normal-form game in each round, matching the information-theoretical lower bound, and the result can be extended to congestion games. Second, in a more restricted setting where we can only choose a graphical game with degree limit $d$, we develop an algorithm to learn the coalition structure in $O(n/d+\log d)$ rounds. Third, when we can only learn the coalition structure through running second-price auctions with personalized reserve prices, we show that the coalition structure can be learned in $O(c\log n)$ rounds, where $c$ is the size of the largest coalition. View details
    Preview abstract Generative AI (GenAI), particularly Large Language Models (LLMs), offer powerful capabilities for interpreting the complex data landscape in healthcare. In this paper, we present a comprehensive overview of the capabilities, requirements and applications of GenAI for deriving clinical insights and improving clinical efficiency. We first provide some background on the forms and sources of patient data, namely real-time Remote Patient Monitoring (RPM) streams and traditional Electronic Health Records (EHR). The sheer volume and heterogeneity of this combined data present significant challenges to clinicians and contribute to information overload. In addition, we explore the potential of LLM-powered applications for improving clinical efficiency. These applications can enhance navigation of longitudinal patient data and provide actionable clinical decision support through natural language dialogue. We discuss the opportunities this presents for streamlining clinician workflows and personalizing care, alongside critical challenges such as data integration complexity, ensuring data quality and RPM data reliability, maintaining patient privacy, validating AI outputs for clinical safety, mitigating bias, and ensuring clinical acceptance. We believe this work represents the first summarization of GenAI techniques for managing clinician data overload due to combined RPM / EHR data complexities. View details
    Preview abstract Understanding and controlling the reasoning processes of large language models (LLMs) is crucial for their reliable deployment. In this work, we investigate the latent representation of self-evaluation behavior - the ability of a model to assess its own reasoning steps - a vital behavior for robust reasoning. Through targeted steering vector computation, we identify a direction within LLM activations that represents this self-evaluation behavior. Crucially, we demonstrate that this steering vector for self-evaluation exhibits remarkable cross-contextual efficacy, working well across different domains (e.g., math and medicine) and languages (e.g., English and Spanish). This suggests that the identified latent direction captures a fundamental, abstract representation of self-evaluation within the LLM's internal state, offering a promising avenue for interpretable and controllable reasoning across diverse applications. View details
    Reconfigurable Stream Network Architecture
    Chengyue Wang
    Jason Cong
    James Hoe
    International Symposium on Computer Architecture (ISCA) (2025)
    Preview abstract As AI systems grow increasingly specialized and complex, managing hardware heterogeneity becomes a pressing challenge. How can we efficiently coordinate and synchronize heterogeneous hardware resources to achieve high utilization? How can we minimize the friction of transitioning between diverse computation phases, reducing costly stalls from initialization, pipeline setup, or drain? Our insight is that a network abstraction at the ISA level naturally unifies heterogeneous resource orchestration and phase transitions. This paper presents a Reconfigurable Stream Network Architecture (RSN), a novel ISA abstraction designed for the DNN domain. RSN models the datapath as a circuit-switched network with stateful functional units as nodes and data streaming on the edges. Programming a computation corresponds to triggering a path. Software is explicitly exposed to the compute and communication latency of each functional unit, enabling precise control over data movement for optimizations such as compute-communication overlap and layer fusion. As nodes in a network naturally differ, the RSN abstraction can efficiently virtualize heterogeneous hardware resources by separating control from the data plane, enabling low instruction-level intervention. We build a proof-of-concept design RSN-XNN on VCK190, a heterogeneous platform with FPGA fabric and AI engines. Compared to the SOTA solution on this platform, it reduces latency by 6.1x and improves throughput by 2.4x-3.2x. Compared to the T4 GPU with the same FP32 performance, it matches latency with only 18% of the memory bandwidth. Compared to the A100 GPU at the same 7nm process node, it achieves 2.1x higher energy efficiency in FP32. View details
    Deep Multi-modal Species Occupancy Modeling
    Timm Haucke
    Yunyi Shen
    Levente Klein
    David Rolnick
    Lauren Gillespie
    Sara Beery
    bioRxiv (2025)
    Preview abstract Occupancy models are tools for modeling the relationship between habitat and species occurrence while accounting for the fact that species may still be present even if not detected. The types of environmental variables typically used for characterizing habitats in such ecological models, such as precipitation or tree cover, are frequently of low spatial resolution, with a single value for a spatial pixel size of, e.g., 1km2. This spatial scale fails to capture the nuances of micro-habitat conditions that can strongly influence species presence, and additionally, as many of these are derived from satellite data, there are aspects of the environment they cannot capture, such as the structure of vegetation below the forest canopy. We propose to combine high-resolution satellite and ground-level imagery to produce multi-modal environmental features that better capture micro-habitat conditions, and incorporate these multi-modal features into hierarchical Bayesian species occupancy models. We leverage pre-trained deep learning models to flexibly capture relevant information directly from raw imagery, in contrast to traditional approaches which rely on derived and/or hand-crafted sets of ecosystem covariates. We implement deep multi-modal species occupancy modeling using a new open-source Python package for ecological modeling, designed for bridging machine learning and statistical ecology. We test our method under a strict evaluation protocol on 16 mammal species across thousands of camera traps in Snapshot USA surveys, and find that multi-modal features substantially enhance predictive power compared to traditional environmental variables alone. Our results not only highlight the predictive value and complementarity of in-situ samples, but also make the case for more closely integrating deep learning models and traditional statistical ecological models. View details
    Preview abstract Users of routing services like Apple Maps, Google Maps, and Waze frequently wonder why a given route is proposed. This question particularly arises when dynamic conditions like traffic and road closures cause unusual routes to be proposed. While many such dynamic conditions may exist in a road network at any time, only a small fraction of those conditions are typically relevant to a given user's route. In this work, we give a simple algorithm that identifies a small set of traffic-laden road segments that answer the following question: Which traffic conditions cause a particular shortest traffic-aware route to differ from the shortest traffic-free route? We theoretically and experimentally show that our algorithm generates small and interpretable answers to this question. View details
    Preview abstract Estimating Origin-Destination (OD) travel demand is vital for effective urban planning and traffic management. Developing universally applicable OD estimation methodologies is significantly challenged by the pervasive scarcity of high-fidelity traffic data and the difficulty in obtaining city-specific prior OD estimates (or seed ODs), which are often prerequisite for traditional approaches. Our proposed method directly estimates OD travel demand by systematically leveraging aggregated, anonymized statistics from Google Maps Traffic Trends, obviating the need for conventional census or city-provided OD data. The OD demand is estimated by formulating a single-level, one-dimensional, continuous nonlinear optimization problem with nonlinear equality and bound constraints to replicate highway path travel times. The method achieves efficiency and scalability by employing a differentiable analytical macroscopic network model. This model by design is computationally lightweight, distinguished by its parsimonious parameterization that requires minimal calibration effort and its capacity for instantaneous evaluation. These attributes ensure the method's broad applicability and practical utility across diverse cities globally. Using segment sensor counts from Los Angeles and San Diego highway networks, we validate our proposed approach, demonstrating a two-thirds to three-quarters improvement in the fit to segment count data over a baseline. Beyond validation, we establish the method's scalability and robust performance in replicating path travel times across diverse highway networks, including Seattle, Orlando, Denver, Philadelphia, and Boston. In these expanded evaluations, our method not only aligns with simulation-based benchmarks but also achieves an average 13% improvement in it's ability to fit travel time data compared to the baseline during afternoon peak hours. View details
    Performance of a Deep Learning Diabetic Retinopathy Algorithm in India
    Arthur Brant
    Xiang Yin
    Lu Yang
    Divleen Jeji
    Sunny Virmani
    Anchintha Meenu
    Naresh Babu Kannan
    Florence Thng
    Lily Peng
    Ramasamy Kim
    JAMA Network Open (2025)
    Preview abstract Importance: While prospective studies have investigated the accuracy of artificial intelligence (AI) for detection of diabetic retinopathy (DR) and diabetic macular edema (DME), to date, little published data exist on the clinical performance of these algorithms. Objective: To evaluate the clinical performance of an automated retinal disease assessment (ARDA) algorithm in the postdeployment setting at Aravind Eye Hospital in India. Design, Setting, and Participants: This cross-sectional analysis involved an approximate 1% sample of fundus photographs from patients screened using ARDA. Images were graded via adjudication by US ophthalmologists for DR and DME, and ARDA’s output was compared against the adjudicated grades at 45 sites in Southern India. Patients were randomly selected between January 1, 2019, and July 31, 2023. Main Outcomes and Measures: Primary analyses were the sensitivity and specificity of ARDA for severe nonproliferative DR (NPDR) or proliferative DR (PDR). Secondary analyses focused on sensitivity and specificity for sight-threatening DR (STDR) (DME or severe NPDR or PDR). Results: Among the 4537 patients with 4537 images with adjudicated grades, mean (SD) age was 55.2 (11.9) years and 2272 (50.1%) were male. Among the 3941 patients with gradable photographs, 683 (17.3%) had any DR, 146 (3.7%) had severe NPDR or PDR, 109 (2.8%) had PDR, and 398 (10.1%) had STDR. ARDA’s sensitivity and specificity for severe NPDR or PDR were 97.0% (95% CI, 92.6%-99.2%) and 96.4% (95% CI, 95.7%-97.0%), respectively. Positive predictive value (PPV) was 50.7% and negative predictive value (NPV) was 99.9%. The clinically important miss rate for severe NPDR or PDR was 0% (eg, some patients with severe NPDR or PDR were interpreted as having moderate DR and referred to clinic). ARDA’s sensitivity for STDR was 95.9% (95% CI, 93.0%-97.4%) and specificity was 94.9% (95% CI, 94.1%-95.7%); PPV and NPV were 67.9% and 99.5%, respectively. Conclusions and Relevance: In this cross-sectional study investigating the clinical performance of ARDA, sensitivity and specificity for severe NPDR and PDR exceeded 96% and caught 100% of patients with severe  NPDR and PDR for ophthalmology referral. This preliminary large-scale postmarketing report of the performance of ARDA after screening 600 000 patients in India underscores the importance of monitoring and publication an algorithm's clinical performance, consistent with recommendations by regulatory bodies. View details
    Preview abstract We investigate Learning from Label Proportions (LLP), a partial information setting where examples in a training set are grouped into bags, and only aggregate label values in each bag are available. Despite the partial observability, the goal is still to achieve small regret at the level of individual examples. We give results on the sample complexity of LLP under square loss, showing that our sample complexity is essentially optimal. From an algorithmic viewpoint, we rely on carefully designed variants of Empirical Risk Minimization, and Stochastic Gradient Descent algorithms, combined with ad hoc variance reduction techniques. On one hand, our theoretical results improve in important ways on the existing literature on LLP, specifically in the way the sample complexity depends on the bag size. On the other hand, we validate our algorithmic solutions on several datasets, demonstrating improved empirical performance (better accuracy for less samples) against recent baselines. View details
    Databases in the Era of Memory-Centric Computing
    Yannis Chronis
    Anastasia Ailamaki
    Lawrence Benson
    Jana Gičeva
    Eric Seldar
    Lisa Wu Wills
    2025
    Preview abstract The increasing disparity between processor core counts and memory bandwidth, coupled with the rising cost and underutilization of memory, introduces a performance and cost Memory Wall and presents a significant challenge to the scalability of database systems. We argue that current processor-centric designs are unsustainable, and we advocate for a shift towards memory-centric computing, where disaggregated memory pools enable cost-effective scaling and robust performance. Database systems are uniquely positioned to leverage memory-centric systems because of their intrinsic data-centric nature. We demonstrate how memory-centric database operations can be realized with current hardware, paving the way for more efficient and scalable data management in the cloud. View details
    Confidence Improves Self-Consistency in LLMs
    Tom Sheffer
    Eran Ofek
    Ariel Goldstein
    Zorik Gekhman
    ACL 2025, Findings (2025)
    Preview abstract Self-consistency decoding enhances LLMs’ performance on reasoning tasks by sampling diverse reasoning paths and selecting the most frequent answer. However, it is computationally expensive, as sampling many of these (lengthy) paths is required to increase the chances that the correct answer emerges as the most frequent one. To address this, we introduce Confidence-Informed Self-Consistency (CISC). CISC performs a weighted majority vote based on confidence scores obtained directly from the model. By prioritizing high-confidence paths, it can identify the correct answer with a significantly smaller sample size. When tested on nine models and four datasets, CISC outperforms self-consistency in nearly all configurations, reducing the required number of reasoning paths by over 40% on average. In addition, we introduce the notion of within-question confidence evaluation, after showing that standard evaluation methods are poor predictors of success in distinguishing correct and incorrect answers to the same question. In fact, the most calibrated confidence method proved to be the least effective for CISC. Lastly, beyond these practical implications, our results and analyses show that LLMs can effectively judge the correctness of their own outputs, contributing to the ongoing debate on this topic. View details
    Preview abstract Intuitively, the more complex a software system is, the harder it is to maintain. Statistically, it is not clear which complexity measures correlate with maintenance effort; in fact, it is not even clear how to objectively measure maintenance burden, so that developers’ sentiment and intuition can be supported by numbers. Without effective complexity and maintenance measures, it remains difficult to objectively monitor maintenance, control complexity, or justify refactoring. In this paper, we report a large-scale study of 1200+ projects written in C++ and Java from Google LLC. In this study, we collected three categories of measures: (1) architectural complexity, measured using propagation cost (PC), decoupling level (DL), and structural anti-patterns; (2) maintenance activity, measured using the number of changes, lines of code (LOC) written, and active coding time (ACT) spent on feature-addition vs. bug-fixing, and (3) developer sentiment on complexity and productivity, collected from 7200 survey responses. We statistically analysed the correlations among these measures and obtained significant evidence of the following findings: 1) the more complex the architecture is (higher propagation cost, more instances of anti-patterns), the more LOC is spent on bug-fixing, rather than adding new features; 2) developers who commit more changes for features, spend more lines of code on features, or spend more time on features also feel that they are less hindered by technical debt and complexity. To the best of our knowledge, this is the first large-scale empirical study establishing the statistical correlation among architectural complexity, maintenance activity, and developer sentiment. The implication is that, instead of solely relying upon developer sentiment and intuitions to detect degraded structure or increased burden to evolve, it is possible to objectively and continuously measure and monitor architectural complexity and maintenance difficulty, increasing feature delivery efficiency by reducing architectural complexity and anti-patterns. View details
    Procurement Auctions via Approximate Submodular Optimization
    Amin Karbasi
    Grigoris Velegkas
    Forty-second International Conference on Machine Learning (2025)
    Preview abstract We study the problem of procurement auctions, in which an auctioneer seeks to acquire services from a group of strategic sellers with private costs. The quality of the services is measured through some \emph{submodular} function that is known to the auctioneer. Our goal is to design \emph{computationally efficient} procurement auctions that (approximately) maximize the difference between the quality of the acquired services and the total cost of the sellers, in a way that is incentive compatible (IC) and individual rational (IR) for the sellers, and generates non-negative surplus (NAS) for the auctioneer. Leveraging recent results from the literature of \emph{non-positive} submodular function maximization, we design computationally efficient frameworks that transform submodular function optimization algorithms to \emph{mechanisms} that are IC and IR for the sellers, NAS for the auctioneer, and \emph{approximation-preserving}. Our frameworks are general and work both in the \emph{offline} setting where the auctioneer can observe the bids and the services of all the sellers simultaneously, and in the \emph{online} setting where the sellers arrive in an adversarial order and the auctioneer has to make an irrevocable decision whether to purchase their service or not. We further investigate whether it is possible to convert state-of-art submodular optimization algorithms into a descending auction. We focurs in the adversarial setting, meaning that the schedule of the descending prices is determined by an advesary. We show that a submodular optimization algorithm satisfying bi-criteria $(\alpha, 1)$-approximation in welfare can be effectively converted to a descending auction in the adversarial setting in if and only if $\alpha \leq \frac 1 2$. Our result highlights the importance of a carefully designed schedule of descending prices to effectively convert a submodular optimization algorithm satisfying bi-criteria $(\alpha, 1)$-approximation in welfare with $\alpha > \frac 1 2$ to a descending auction. We also further establish a connection between descending auctions and online submodular optimization algorithms. We demonstrate the practical applications of our frameworks by instantiating them with different state-of-the-art submodular optimization algorithms and comparing their welfare performance through empirical experiments on publicly available datasets that consist of thousands of sellers. View details
    Preview abstract We propose Heterogeneous Swarms, an algorithm to discover and adapt multi-LLM systems by jointly optimizing model roles and weights. Given a pool of LLM experts and a utility function, Heterogeneous Swarms employs two iterative steps: role-step and weight-step. For role-step, we interpret model roles as input-output relationships and optimize the directed acyclic graph (DAG) of LLMs representing a multi-LLM system. Starting from a swarm of randomly initialized continuous adjacency matrices, we decode them into discrete DAGs, call the LLMs in topological order with message passing, evaluate on the utility function, and optimize the adjacency matrices with swarm intelligence based on the utility score. For weight-step, we define JFK-score to evaluate the contribution of individual LLMs in the best-found DAG of the role-step, then optimize model weights with swarm intelligence based on the JFK-score. Extensive experiments demonstrate that Heterogeneous Swarms outperforms 15 baselines spanning role-based and weight-based approaches by 18.5% on average across 12 tasks and contexts. Further analysis reveals that Heterogeneous Swarms discovers multi-LLM systems with heterogeneous model roles and substantial collaborative gains, and benefits from the diversity of initial LLMs. View details
    Collaborative Diffusion Model for Recommender System
    Gyuseok Lee
    Yaochen Zhu
    Hwanjo Yu
    Yao Zhou
    Jundong Li
    2025
    Preview abstract Diffusion-based recommender systems (DR) have gained increasing attention for their advanced generative and denoising capabilities. However, existing DR face two central limitations: (i) a trade-off between enhancing generative capacity via noise injection and retaining the loss of personalized information. (ii) the underutilization of rich item-side information. To address these challenges, we present a Collaborative Diffusion model for Recommender System (CDiff4Rec). Specifically, CDiff4Rec generates pseudo-users from item features and leverages collaborative signals from both real and pseudo personalized neighbors identified through behavioral similarity, thereby effectively reconstructing nuanced user preferences. Experimental results on three public datasets show that CDiff4Rec outperforms competitors by effectively mitigating the loss of personalized information through the integration of item content and collaborative signals. View details