Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 10392 publications
    Preview abstract Recent work suggested utilizing inference compute, showing that scaling of number of samples consistently improves the fractions of problems solved by any attempt, namely the coverage. In this work, we suggest that inference scaling gains should be compared with proper baselines, as some datasets become degenerate when allowing a large number of attempts. We focus on two domains - mathematical reasoning and factual knowledge, showing that for the MATH and Entity Questions datasets, informed answer enumeration obtains similar or even better results than repeated model sampling, with a much lower sample budget. While we believe that inference scaling is a promising approach for unlocking the potential of language models, we recommend carefully selecting models and datasets when applying this method. Otherwise, the results of inference scaling should be interpreted with caution. View details
    Preview abstract Measuring productivity is equivalent to building a model. All models are wrong, but some are useful. Productivity models are often “worryingly selective” (wrong because of omissions). Worrying selectivity can be combated by taking a holistic approach that includes multiple measurements of multiple outcomes. Productivity models should include multiple outcomes, metrics, and methods. View details
    Preview abstract The proliferation of IoT in cities, combined with Digital Twins, creates a rich data foundation for Smart Cities aimed at improving urban life and operations. Generative AI (GenAI) significantly enhances this potential, moving beyond traditional AI analytics by processing multimodal content and generating novel outputs like text and simulations. Using specialized or foundational models, GenAI's natural language abilities such as Natural Language Understanding (NLU) and Generation (NLG) can power tailored applications and unified interfaces, dramatically lowering barriers for users interacting with complex smart city systems. In this paper, we focus on GenAI applications based on conversational interfaces within the context of three critical user archetypes in a Smart City - Citizens, Operators and Planners. We identify and review GenAI models and techniques that have been proposed or deployed for various urban subsystems in the contexts of these user archetypes. We also consider how GenAI can be built on the existing data foundation of official city records, IoT data streams and Urban Digital Twins. We believe this work represents the first comprehensive summarization of GenAI techniques for Smart Cities from the lens of the critical users in a Smart City. View details
    Preview abstract Transformers have become the predominant architecture in foundation models due to their excellent performance across various domains. However, the substantial cost of scaling these models remains a significant concern. This problem arises primarily from their dependence on fixed parameters within linear projections, especially when architectural modifications (e.g., channel dimensions) are introduced. Each scaling iteration typically requires retraining the entire model from the beginning, leading to suboptimal utilization of computational resources. To overcome this limitation, we introduce TokenFormer, a naturally scalable architecture that leverages the attention mechanism exclusively for computations among input tokens and interactions between input tokens and model parameters, thereby enhancing architectural flexibility. By treating model parameters as tokens, we replace all the linear projections in Transformer with our token-parameter attention layer, where input tokens act as queries and model parameters as keys and values. This innovative approach allows for progressive and efficient scaling without necessitating retraining from scratch. Our model scales from 124 million to 1.4 billion parameters by incrementally adding new key-value parameters, achieving performance comparable to models trained from scratch while greatly reducing training costs. Code and models will be publicly available. View details
    Preview abstract Cloud platforms have been virtualizing storage devices like flash-based solid-state drives (SSDs) to make effective use of storage resources. They enable either software-isolated instance or hardware-isolated instance for facilitating the storage sharing between multi-tenant applications. However, for decades, they have to combat the fundamental tussle between the performance isolation and resource utilization. They suffer from either long tail latency caused by weak isolation or low storage utilization caused by strong isolation. In this paper, we present FleetIO, a learning-based storage virtualization framework that employs reinforcement learning (RL) for managing virtualized SSDs. FleetIO explores the unique features of RL to handle the dynamic changes of application workloads and storage states, and integrates the storage scheduling into the RL decision-making process. It achieves both performance isolation and improved storage utilization by enabling dynamic fine-grained storage harvesting across co-located application instances, while minimizing its negative impact on their service-level objectives (SLOs). FleetIO clusters workloads into different types (e.g., latency-sensitive and bandwidth-intensive) based on the collected I/O traces at runtime, and fine-tunes the RL reward functions for each type of workloads. We implement FleetIO on a real programmable SSD board and evaluate it with diverse cloud applications. We show that FleetIO improves the overall storage utilization of the shared SSD by up to 1.4×, and decreases the tail latency of I/O requests by 1.5× on average, compared to the state-of-the-art storage sharing approaches. View details
    Preview abstract Generative Artificial Intelligence (AI), particularly Large Language Models (LLMs), have demonstrated significant potential in clinical reasoning skills such as history-taking and differential diagnosis generation—critical aspects of medical education. This work explores how LLMs can augment medical curricula through interactive learning. We conducted a participatory design process with medical students, residents and medical education experts to co-create an AI-powered tutor prototype for clinical reasoning. As part of the co-design process, we conducted a qualitative user study, investigating learning needs and practices via interviews, and conducting concept evaluations through interactions with the prototype. Findings highlight the challenges learners face in transitioning from theoretical knowledge to practical application, and how an AI tutor can provide personalized practice and feedback. We conclude with design considerations, emphasizing the importance of context-specific knowledge and emulating positive preceptor traits, to guide the development of AI tools for medical education. View details
    Preview abstract Large Language Models (LLMs) are revolutionizing many areas of AI, but their substantial resource requirements limit their deployment on mobile and edge devices. This survey paper provides a comprehensive overview of techniques for compressing LLMs to enable efficient inference in resource-constrained environments. We examine three primary approaches: knowledge distillation, model quantization and model pruning. For each technique, we discuss the underlying principles, present different forms, and provide examples of successful applications. We also briefly discuss complementary techniques like mixture-of-experts and early exit strategies and highlight the promising future directions. We aim to provide a valuable resource for both researchers and practitioners seeking to optimize LLMs for edge deployment. To the best of our knowledge, this is the first paper that provides a focused survey of LLM compression techniques from the lens of resource-constrained environments. View details
    Preview abstract This paper adopts a Usage-Based Construction Grammar perspective to compare human- and AI-generated language, focusing on Verb-Argument Constructions (VACs) as a lens for analysis. Specifically, we examine solicited advice texts in two domains—Finance and Medicine—produced by humans and ChatGPT across different GPT models (3.5, 4, and 4o) and interfaces (3.5 Web vs. 3.5 API). Our findings reveal broad consistency in the frequency and distribution of the most common VACs across human- and AI-generated texts, though ChatGPT exhibits a slightly higher reliance on the most frequent constructions. A closer examination of the verbs occupying these constructions uncovers significant differences in the meanings conveyed, with a notable growth away from human-like language production in macro level perspectives (e.g., length) and towards humanlike verb-VAC patterns with newer models. These results underscore the potential of VACs as a powerful tool for analyzing AI-generated language and tracking its evolution over time. View details
    Preview abstract Generative AI (GenAI), particularly Large Language Models (LLMs), offer powerful capabilities for interpreting the complex data landscape in healthcare. In this paper, we present a comprehensive overview of the capabilities, requirements and applications of GenAI for deriving clinical insights and improving clinical efficiency. We first provide some background on the forms and sources of patient data, namely real-time Remote Patient Monitoring (RPM) streams and traditional Electronic Health Records (EHR). The sheer volume and heterogeneity of this combined data present significant challenges to clinicians and contribute to information overload. In addition, we explore the potential of LLM-powered applications for improving clinical efficiency. These applications can enhance navigation of longitudinal patient data and provide actionable clinical decision support through natural language dialogue. We discuss the opportunities this presents for streamlining clinician workflows and personalizing care, alongside critical challenges such as data integration complexity, ensuring data quality and RPM data reliability, maintaining patient privacy, validating AI outputs for clinical safety, mitigating bias, and ensuring clinical acceptance. We believe this work represents the first summarization of GenAI techniques for managing clinician data overload due to combined RPM / EHR data complexities. View details
    Software development is a team sport
    Jie Chen
    Alison Chang
    Rayven Plaza
    Marie Huber
    Claire Taylor
    IEEE Software (2025)
    Preview abstract In this article, we describe our human-centered research focused on understanding the role of collaboration and teamwork in productive software development. We describe creation of a logs-based metric to identify collaboration through observable events and a survey-based multi-item scale to assess team functioning. View details
    Linear Elastic Caching via Ski Rental
    Todd Lipcon
    The biennial Conference on Innovative Data Systems Research (2025)
    Preview abstract In this work we study the Linear Elastic Caching problem, where the goal is to minimize the total cost of a cache inclusive of not just its misses, but also its memory footprint integrated over time. We demonstrate a theoretical connection to the classic ski rental problem and propose a practical algorithm that combines online caching algorithms with ski rental policies. We also introduce a lightweight machine learning-based algorithm for ski rental that is optimized for production workloads and is easy to integrate within existing database systems. Evaluations on both production workloads in Google Spanner and publicly available traces show that the proposed elastic caching approach can significantly reduce the total cache cost compared to traditional fixed-size cache policies. View details
    Preview abstract This paper presents SYMBIOSIS, an AI-powered framework to make Systems Thinking accessible for addressing societal challenges and unlock paths for leveraging systems thinking framework to improve AI systems. The platform establishes a centralized, open-source repository of systems thinking/system dynamics models categorized by Sustainable Development Goals (SDGs) and societal topics using topic modeling and classification techniques. Systems Thinking resources, though critical for articulating causal theories in complex problem spaces, are often locked behind specialized tools and intricate notations, creating high barriers to entry. To address this, we developed a generative co-pilot that translates complex systems representations - such as causal loops and stock-flow diagrams - into natural language (and vice-versa), allowing users to explore and build models without extensive technical training. Rooted in community-based system dynamics (CBSD) and informed by community-driven insights on societal context, we aim to bridge the problem understanding chasm. This gap, driven by epistemic uncertainty, often limits ML developers who lack the community-specific knowledge essential for problem understanding and formulation, often leading to misaligned causal theories and reduced intervention effectiveness. Recent research identifies causal and abductive reasoning as crucial frontiers for AI, and Systems Thinking provides a naturally compatible framework for both. By making Systems Thinking frameworks more accessible and user-friendly, we aim to serve as a foundational step to unlock future research into Responsible and society-centered AI that better integrates societal context leveraging systems thinking framework and models. Our work underscores the need for ongoing research into AI's capacity essential system dynamics such as feedback processes and time delays, paving the way for more socially attuned, effective AI systems. View details
    Preview abstract Sign Language Translation (SLT) aims to map sign language videos to spoken language text. A common approach leverages gloss annotations as an intermediate representation, decomposing SLT into two sub-tasks: video-to-gloss recognition and gloss-to-text translation. While effective, this paradigm relies on expert-annotated gloss labels, which are costly and increasingly unavailable in many datasets, limiting scalability. To address this challenge, we propose a gloss-free pseudo gloss generation framework that eliminates the need for human-annotated glosses while preserving the structured intermediate representation. Specifically, we prompt a Large Language Model (LLM) with example text-gloss pairs to extract potential sign-related gloss words from the text by leveraging its in-context learning capability. To mitigate the inherent misalignment between generated pseudo glosses and sign sequences in the video, we further refine their order by formulating the alignment as a weakly supervised learning problem. With the reordered pseudo-glosses, additional alignment losses such as CTC can be incorporated to enhance supervision. We train our SLT model—comprising a vision encoder and a translator—under a three-stage pipeline, effectively bridging the gap between sign and spoken language. Despite its simplicity, our approach outperforms previous state-of-the-art gloss-free frameworks across three SLT benchmarks and achieves competitive results with gloss-based methods. View details
    Triaging mammography with artificial intelligence: an implementation study
    Sarah M. Friedewald
    Sunny Jansen
    Fereshteh Mahvar
    Timo Kohlberger
    David V. Schacht
    Sonya Bhole
    Dipti Gupta
    Scott Mayer McKinney
    Stacey Caron
    David Melnick
    Mozziyar Etemadi
    Samantha Winter
    Alejandra Maciel
    Luca Speroni
    Martha Sevenich
    Arnav Agharwal
    Rubin Zhang
    Gavin Duggan
    Shiro Kadowaki
    Atilla Kiraly
    Jie Yang
    Basil Mustafa
    Krish Eswaran
    Shravya Shetty
    Breast Cancer Research and Treatment (2025)
    Preview abstract Purpose Many breast centers are unable to provide immediate results at the time of screening mammography which results in delayed patient care. Implementing artificial intelligence (AI) could identify patients who may have breast cancer and accelerate the time to diagnostic imaging and biopsy diagnosis. Methods In this prospective randomized, unblinded, controlled implementation study we enrolled 1000 screening participants between March 2021 and May 2022. The experimental group used an AI system to prioritize a subset of cases for same-visit radiologist evaluation, and same-visit diagnostic workup if necessary. The control group followed the standard of care. The primary operational endpoints were time to additional imaging (TA) and time to biopsy diagnosis (TB). Results The final cohort included 463 experimental and 392 control participants. The one-sided Mann-Whitney U test was employed for analysis of TA and TB. In the control group, the TA was 25.6 days [95% CI 22.0–29.9] and TB was 55.9 days [95% CI 45.5–69.6]. In comparison, the experimental group's mean TA was reduced by 25% (6.4 fewer days [one-sided 95% CI > 0.3], p<0.001) and mean TB was reduced by 30% (16.8 fewer days; 95% CI > 5.1], p=0.003). The time reduction was more pronounced for AI-prioritized participants in the experimental group. All participants eventually diagnosed with breast cancer were prioritized by the AI. Conclusions Implementing AI prioritization can accelerate care timelines for patients requiring additional workup, while maintaining the efficiency of delayed interpretation for most participants. Reducing diagnostic delays could contribute to improved patient adherence, decreased anxiety and addressing disparities in access to timely care. View details
    ESAM++: Efficient Online 3D Perception on the Edge
    Qin Liu
    Lavisha Aggarwal
    Vikas Bahirwani
    Lin Li
    Aleksander Holynski
    Saptarashmi Bandyopadhyay
    Zhengyang Shen
    Marc Niethammer
    Ehsan Adeli
    Andrea Colaco
    2025
    Preview abstract Online 3D scene perception in real time is critical for robotics, AR/VR, and autonomous systems, particularly in edge computing scenarios where computational resources are limited. Recent state-of-the-art methods like EmbodiedSAM (ESAM) demonstrate the promise of online 3D perception by leveraging the 2D visual foundation model (VFM) with efficient 3D query lifting and merging. However, ESAM depends on a computationally expensive sparse 3D U-Net for point cloud feature extraction, which we identify as the primary efficiency bottleneck. In this paper, we propose a lightweight and scalable alternative for online 3D scene perception tailored to edge devices. Our method introduces a 3D Sparse FeaturePyramid Network (SFPN) that efficiently captures multi-scale geometric features from streaming 3D point clouds while significantly reducing computational over-head and model size. We evaluate our approach on four challenging segmentation benchmarks—ScanNet, ScanNet200, SceneNN, and 3RScan—demonstrating that our model achieves competitive accuracy with up to 3×faster inference and 3×small model size compared to ESAM, enabling practical deployment in real-world edge scenarios. Code and models will be released. View details