Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10235 publications
Triaging mammography with artificial intelligence: an implementation study
Sarah M. Friedewald
Sunny Jansen
Fereshteh Mahvar
Timo Kohlberger
David V. Schacht
Sonya Bhole
Dipti Gupta
Scott Mayer McKinney
Stacey Caron
David Melnick
Mozziyar Etemadi
Samantha Winter
Alejandra Maciel
Luca Speroni
Martha Sevenich
Arnav Agharwal
Rubin Zhang
Gavin Duggan
Shiro Kadowaki
Atilla Kiraly
Jie Yang
Basil Mustafa
Krish Eswaran
Shravya Shetty
Breast Cancer Research and Treatment (2025)
Preview abstract
Purpose
Many breast centers are unable to provide immediate results at the time of screening mammography which results in delayed patient care. Implementing artificial intelligence (AI) could identify patients who may have breast cancer and accelerate the time to diagnostic imaging and biopsy diagnosis.
Methods
In this prospective randomized, unblinded, controlled implementation study we enrolled 1000 screening participants between March 2021 and May 2022. The experimental group used an AI system to prioritize a subset of cases for same-visit radiologist evaluation, and same-visit diagnostic workup if necessary. The control group followed the standard of care. The primary operational endpoints were time to additional imaging (TA) and time to biopsy diagnosis (TB).
Results
The final cohort included 463 experimental and 392 control participants. The one-sided Mann-Whitney U test was employed for analysis of TA and TB. In the control group, the TA was 25.6 days [95% CI 22.0–29.9] and TB was 55.9 days [95% CI 45.5–69.6]. In comparison, the experimental group's mean TA was reduced by 25% (6.4 fewer days [one-sided 95% CI > 0.3], p<0.001) and mean TB was reduced by 30% (16.8 fewer days; 95% CI > 5.1], p=0.003). The time reduction was more pronounced for AI-prioritized participants in the experimental group. All participants eventually diagnosed with breast cancer were prioritized by the AI.
Conclusions
Implementing AI prioritization can accelerate care timelines for patients requiring additional workup, while maintaining the efficiency of delayed interpretation for most participants. Reducing diagnostic delays could contribute to improved patient adherence, decreased anxiety and addressing disparities in access to timely care.
View details
Beyond Touchscreens: Designing for Co-Occurring Accessibility Needs
Melissa Barnhart Wantland
Mai Kobori
Universal Access in Human-Computer Interaction, Springer-Verlag (2025) (to appear)
Preview abstract
Today’s smartphone interactions are typically designed with one primary preset, accompanied by customization settings that can be manually adjusted. To promote the creation of contextually aware experiences, researchers have highlighted the factors that influence mobile device usage in the ability-based design framework. This paper expands upon existing frameworks and contributes to an empirical understanding of smartphone accessibility. Through a 10-day longitudinal diary study and video interview with 24 individuals who do and do not identify as having a disability, the research also illustrates the reactions of reattempt, adaptation, and avoidance, which were used in response to a lack of smartphone accessibility. Despite experiencing scenarios where accessibility settings could be leveraged, 20 out of 24 participants did not use accessibility settings on their smartphone. A total of 12 out of 24 participants tried accessibility settings on their smartphones, however identifying accessibility was not for them. This work highlights the need to shift current design practices to better serve the accessibility community.
View details
Preview abstract
Storage on Android has evolved significantly over the years, with each new Android version introducing changes aimed at enhancing usability, security, and privacy. While these updates typically help with restricting app access to storage through various mechanisms, they may occasionally introduce new complexities and vulnerabilities. A prime example is the introduction of scoped storage in Android 10, which fundamentally changed how apps interact with files. While intended to enhance user privacy by limiting broad access to shared storage, scoped storage has also presented developers with new challenges and potential vulnerabilities to address. However, despite its significance for user privacy and app functionality, no systematic studies have been performed to study Android’s scoped storage at depth from a security perspective. In this paper, we present the first systematic security analysis of the scoped storage mechanism. To this end, we design and implement a testing tool, named ScopeVerif, that relies on differential analysis to uncover security issues and implementation inconsistencies in Android’s storage. Specifically, ScopeVerif takes a list of security properties and checks if there are any file operations that violate any security properties defined in the official Android documentation. Additionally, we conduct a comprehensive analysis across different Android versions as well as a cross-OEM analysis to identify discrepancies in different implementations and their security implications. Our study identifies both known and unknown issues of scoped storage. Our cross-version analysis highlights undocumented changes as well as partially fixed security loopholes across versions. Additionally, we discovered several vulnerabilities in scoped storage implementations by different OEMs. These vulnerabilities stem from deviations from the documented and correct behavior, which potentially poses security risks. The affected OEMs and Google have acknowledged our findings and offered us bug bounties in response.
View details
Governance, Risk and Compliance (GRC) Engineering: Data, AI, Automation, and the Future of Compliance to Audits
Eric Zhang
Ruchi Khurana
Vikram Khare
2025
Preview abstract
In today's rapidly evolving business landscape, Governance, Risk, and Compliance (GRC) leaders in large, complex organizations face unprecedented challenges. The cloud has revolutionized how businesses operate, offering unprecedented scalability, flexibility, cost-efficiency, additional security and resilience. However, this transformation also presents new challenges for GRC professionals. In a cloud-native world, where applications are built and deployed in dynamic, distributed environments, traditional GRC on-prem approaches, manual processes and spreadsheets struggle to keep pace. The key to success lies in embracing a data-driven GRC strategy that leverages the power of the cloud to enhance agility, visibility, and resilience.
View details
Scaling Laws for Downstream Task Performance in Machine Translation
Natalia Ponomareva
Hussein Hazimeh
Sanmi Koyejo
International Conference on Learning Representations (ICLR) (2025) (to appear)
Preview abstract
Scaling laws provide important insights that can guide the design of large language models (LLMs). Existing work has primarily focused on studying scaling laws for pretraining (upstream) loss. However, in transfer learning settings, in which LLMs are pretrained on an unsupervised dataset and then finetuned on a downstream task, we often also care about the downstream performance. In this work, we study the scaling behavior in a transfer learning setting, where LLMs are finetuned for machine translation tasks. Specifically, we investigate how the choice of the \emph{pretraining} data and its size affect downstream performance (translation quality) as judged by: downstream cross-entropy and translation quality metrics such as BLEU and COMET scores. Our experiments indicate that the size of the finetuning dataset and the distribution alignment between the pretraining and downstream data significantly influence the scaling behavior. With sufficient alignment, both downstream cross-entropy and translation quality scores improve monotonically with more pretraining data. In such cases, we show that it is possible to predict the downstream translation quality metrics with good accuracy using a log-law. However, there are cases where moderate misalignment causes the downstream translation scores to fluctuate or get worse with more pretraining, whereas downstream cross-entropy monotonically improves. By analyzing these, we provide new practical insights for choosing appropriate pretraining data.
View details
Preview abstract
Mainstream artificial neural network models, such as Deep Neural Networks (DNNs) are computation-heavy and energy-hungry. Weightless Neural Networks (WNNs) are natively built with RAM-based neurons and represent an entirely distinct type of neural network computing compared to DNNs. WNNs are extremely low-latency, low-energy, and suitable for efficient, accurate, edge inference. The WNN approach derives an implicit inspiration from the decoding process observed in the dendritic trees of biological neurons, making neurons based on Random Access Memories (RAMs) and/or Lookup Tables (LUTs) ready-to-deploy neuromorphic digital circuits. Since FPGAs are abundant in LUTs, LUT based WNNs are a natural fit for implementing edge inference in FPGAs.
WNNs has been demonstrated to be an energetically efficient AI model, both in software, as well as in hardware. For instance, the most recent DWN – Differential Weightless Neural Network – model demonstrates up to 135× reduction in energy costs in FPGA implementations compared to other multiplication-free approaches, such as binary neural networks (BNNs) and DiffLogicNet, up to 9% higher accuracy in deployments on constrained devices, and culminate in up to 42.8× reduction in circuit area for ultra-low-cost chip implementations. This tutorial will help participants understand how WNNs work, why WNNs were underdogs for such a long time, and be introduced to the most recent members of the WNN family, such as BTHOWeN , LogicWiSARD, COIN, ULEEN and DWN, and contrast to BNNs and LogicNets.
View details
SMaCk: Efficient Instruction Cache Attacks via Self-Modifying Code Conflicts
Seonghun Son
Berk Gulmezoglu
ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (2025) (to appear)
Preview abstract
Self-modifying code (SMC) allows programs to alter their own instructions, optimizing performance and functionality on x86 processors. Despite its benefits, SMC introduces unique microarchitectural behaviors that can be exploited for malicious purposes. In this paper, we explore the security implications of SMC by examining how specific x86 instructions affecting instruction cache lines lead to measurable timing discrepancies between cache hits and misses. These discrepancies facilitate refined cache attacks, making them less noisy and more effective. We introduce novel attack techniques that leverage these timing variations to enhance existing methods such as Prime+Probe and Flush+Reload. Our advanced techniques allow adversaries to more precisely attack cryptographic keys and create covert channels akin
to Spectre across various x86 platforms. Finally, we propose a dynamic detection methodology utilizing hardware performance counters to mitigate these enhanced threats.
View details
Preview abstract
The problem of contract design addresses the challenge of moral hazard in principle-agent setups. The agent exerts costly efforts that produce a random outcome with an associated reward for the principal. Moral hazard refers to the tension that the principal cannot observe the agent’s effort level hence needs to incentivize the agent only through rewarding the realized effort outcome, i.e., the contract. Bayesian contract design studies the principal’s design problem of an optimal contract when facing an unknown agent characterized by a private Bayesian type. In its most general form, the agent’s type is inherently “multi-parameter” and can arbitrarily affect both the agent’s productivity and effort costs. In contrast, a natural single-parameter setting of much recent interest simplifies the agent’s type to a single value that describes the agent’s cost per unit of effort, whereas agents’ efforts are assumed to be equally
productive.
The main result of this paper is an almost approximation-preserving polynomial-time reduction from the most general multi-parameter Bayesian contract design (BCD) to single-parameter BCD. That is, for any multi-parameter BCD instance I^M, we construct a single-parameter instance I^S such that any β-approximate contract (resp. menu of contracts) of I^S can in turn be converted to a (β − ϵ)-approximate contract (resp. menu of contracts) of I^M. The reduction is in time polynomial in the input size and log(1/ϵ); moreover, when β = 1 (i.e., the given single-parameter solution is exactly optimal), the dependence on 1/ϵ can be removed, leading to a polynomial-time exact reduction. This efficient reduction is somewhat surprising because in the closely related problem of Bayesian mechanism design, a polynomial-time reduction from multi-parameter to single-parameter setting is believed to not exist. Our result demonstrates the intrinsic difficulty of addressing moral hazard in Bayesian contract design, regardless of being single-parameter or multi-parameter.
As byproducts, our reduction answers two open questions in recent literature of algorithmic contract design: (a) it implies that optimal contract design in single-parameter BCD is not in APX unless P=NP even when the agent’s type distribution is regular, answering the open question of [3] in the negative; (b) it implies that the principal’s (order-wise) tight utility gap between using a menu of contracts and a single contract is Θ(n) where n is the number of actions, answering the major open question of [27] for the single-parameter case.
View details
Gemini & Physical World: Large Language Models Can Estimate the Intensity of Earthquake Shaking from Multi-Modal Social Media Posts
Marc Stogaitis
Tajinder Gadh
Richard Allen
Alexei Barski
Robert Bosch
Patrick Robertson
Youngmin Cho
Nivetha Thiruverahan
Aman Raj
Geophysical Journal International (2025), ggae436
Preview abstract
This paper presents a novel approach for estimating the ground shaking intensity using real-time social media data and CCTV footage. Employing the Gemini 1.5 Pro’s (Reid et al. 2024) model, a multi-modal language model, we demonstrate the ability to extract relevant information from unstructured data utilizing generative AI and natural language processing. The model’s output, in the form of Modified Mercalli Intensity (MMI) values, align well with independent observational data. Furthermore, our results suggest that beyond its advanced visual and auditory understanding abilities, Gemini appears to utilize additional sources of knowledge, including a simplified understanding of the general relationship between earthquake magnitude, distance, and MMI intensity, which it presumably acquired during its training, in its reasoning and decision-making processes. These findings raise intriguing questions about the extent of Gemini's general understanding of the physical world and its phenomena. Gemini’s ability to generate results consistent with established scientific knowledge highlights the potential of LLMs like Gemini in augmenting our understanding of complex physical phenomena such as earthquakes. More specifically, the results of this study highlight the potential of LLMs like Gemini to revolutionize citizen seismology by enabling rapid, effective, and flexible analysis of crowdsourced data from eyewitness accounts for assessing earthquake impact and providing crisis situational awareness. This approach holds a great promise for improving early warning systems, disaster response, and overall resilience in earthquake-prone regions. This study provides a significant step toward harnessing the power of social media and AI for earthquake disaster mitigation.
View details
InstructPipe: Building Visual Programming Pipelines in Visual Blocks with Human Instructions Using LLMs
Zhongyi Zhou
Jing Jin
Xiuxiu Yuan
Jun Jiang
Jingtao Zhou
Yiyi Huang
Kristen Wright
Jason Mayes
Mark Sherwood
Alex Olwal
Ram Iyengar
Na Li
Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI), ACM, pp. 23
Preview abstract
Visual programming provides beginner-level programmers with a coding-free experience to build their customized pipelines. Existing systems require users to build a pipeline entirely from scratch, implying that novice users need to set up and link appropriate nodes all by themselves, starting from a blank workspace. We present InstructPipe, an AI assistant that enables users to start prototyping machine learning (ML) pipelines with text instructions. We designed two LLM modules and a code interpreter to execute our solution. LLM modules generate pseudocode of a target pipeline, and the interpreter renders a pipeline in the node-graph editor for further human-AI collaboration. Technical evaluations reveal that InstructPipe reduces user interactions by 81.1% compared to traditional methods. Our user study (N=16) showed that InstructPipe empowers novice users to streamline their workflow in creating desired ML pipelines, reduce their learning curve, and spark innovative ideas with open-ended commands.
View details
Linear Elastic Caching via Ski Rental
Todd Lipcon
The biennial Conference on Innovative Data Systems Research (2025)
Preview abstract
In this work we study the Linear Elastic Caching problem, where the goal is to minimize the total cost of a cache inclusive of not just its misses, but also its memory footprint integrated over time. We demonstrate a theoretical connection to the classic ski rental problem and propose a practical algorithm that combines online caching algorithms with ski rental policies. We also introduce a lightweight machine learning-based algorithm for ski rental that is optimized for production workloads and is easy to integrate within existing database systems. Evaluations on both production workloads in Google Spanner and publicly available traces show that the proposed elastic caching approach can significantly reduce the total cache cost compared to traditional fixed-size cache policies.
View details
Security Signals: Making Web Security Posture Measurable At Scale
David Dworken
Artur Janc
Santiago (Sal) Díaz
Workshop on Measurements, Attacks, and Defenses for the Web (MADWeb)
Preview abstract
The area of security measurability is gaining increased attention, with a wide range of organizations calling for the development of scalable approaches for assessing the security of software systems and infrastructure. In this paper, we present our experience developing Security Signals, a comprehensive system providing security measurability for web services, deployed in a complex application ecosystem of thousands of web services handling traffic from billions of users. The system collects security-relevant information from production HTTP traffic at the reverse proxy layer, utilizing novel concepts such as synthetic signals augmented with additional risk information to provide a holistic view of the security posture of individual services and the broader application ecosystem. This approach to measurability has enabled large-scale security improvements to our services, including prioritized rollouts of security enhancements and the implementation of automated regression monitoring. Furthermore, it has proven valuable for security research and prioritization of defensive work. Security Signals addresses shortcomings of prior web measurability proposals by tracking a comprehensive set of security properties relevant to web applications, and by extracting insights from collected data for use by both security experts and non-experts. We believe the lessons learned from the implementation and use of Security Signals offer valuable insights for practitioners responsible for web service security, potentially inspiring new approaches to web security measurability.
View details
Circadian rhythm of heart rate and activity: a cross-sectional study
Maryam Khalid
Logan Schneider
Aravind Natarajan
Conor Heneghan
Karla Gleichauf
Chronobiology International (2025)
Preview abstract
ABSTRACT
Background: Circadian rhythms are commonly observed in a number of physiological processes. Consumer wearable devices have made it possible to obtain continuous time series data from a large number of individuals. We study circadian rhythms from measurements of heart rate, movement, and sleep, from a cohort of nearly 20,000 participants over the course of 30 days.
Methods: Participation was restricted to Fitbit users of age 21 years or older residing in the United States or Canada. Participants were enrolled through a recruitment banner shown on the Fitbit App. The advertisement was shown to 531,359 Fitbit users, and 23,239 enrolled in the program. Of these, we obtained heart rate data from 19,350 participants. We obtain the underlying circadian rhythm from time series heart rate by modeling the circadian rhythm as a sum over the first two Fourier harmonics. The first Fourier harmonic accounts for the 24-hour rhythmicity, while the second harmonic accounts for non-sinusoidal perturbations.
Findings: We observe a circadian rhythm in both heart rate and acceleration. From the diurnal modulation, we obtain the following circadian parameters: (i) amplitude of modulation, (ii) bathyphase, (iii) acrophase, (iv) non-sinusoidal fraction, and (v) fraction of day when the heart rate is greater than the mean. The amplitude, bathyphase, and acrophase depend on sex, and decrease with age. The waketime on average, follows the bathyphase by 2.4 hours. In most individuals, the circadian rhythm of heart rate lags the circadian rhythm of activity.
Interpretation: Circadian metrics for heart rate and activity can be reliably obtained from commercially available wearable devices. Distributions of circadian metrics can be valuable tools for individual-level interpretation.
View details
Databases in the Era of Memory-Centric Computing
Anastasia Ailamaki
Lawrence Benson
Helena Caminal
Jana Gičeva
Eric Seldar
Lisa Wu Wills
2025
Preview abstract
The increasing disparity between processor core counts and memory bandwidth, coupled with the rising cost and underutilization of memory, introduces a performance and cost Memory Wall and presents a significant challenge to the scalability of database systems. We argue that current processor-centric designs are unsustainable, and we advocate for a shift towards memory-centric computing, where disaggregated memory pools enable cost-effective scaling and robust performance. Database systems are uniquely positioned to leverage memory-centric systems because of their intrinsic data-centric nature. We demonstrate how memory-centric database operations can be realized with current hardware, paving the way for more efficient and scalable data management in the cloud.
View details
Preview abstract
Multimodal AI Agents are AI models that have the capability of interactively and cooperatively assisting human users to solve day-to-day tasks. Augmented Reality (AR) head worn devices can uniquely improve the user experience of solving procedural day-to-day tasks by providing egocentric multimodal (audio and video) observational capabilities to AI Agents. Such AR capabilities can help the AI Agents see and listen to actions that users take which can relate to multimodal capabilities of human users. Existing AI Agents, either Large Language Models (LLMs) or Multimodal Vision-Language Models (VLMs) are reactive in nature, which means that models cannot take an action without reading or listening to the human user's prompts. Proactivity of AI Agents, on the other hand, can help the human user detect and correct any mistakes in agent observed tasks, encourage users when they do tasks correctly, or simply engage in conversation with the user - akin to a human teaching or assisting a user. Our proposed YET to Intervene (YETI) multimodal Agent focuses on the research question of identifying circumstances that may require the Agent to intervene proactively. This allows the Agent to understand when it can intervene in a conversation with human users that can help the user correct mistakes on tasks, like cooking, using Augmented Reality. Our YETI Agent learns scene understanding signals based on interpretable notions of Structural Similarity (SSIM) on consecutive video frames. We also define the alignment signal which the AI Agent can learn to identify if the video frames corresponding to the user's actions on the task are consistent with expected actions. These signals are used by our AI Agent to determine when it should proactively intervene. We compare our results on the instances of proactive intervention in the HoloAssist multimodal benchmark for an expert agent guiding an user agent to complete procedural tasks.
View details