Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10473 publications
Silent Data Corruption by 10× Test Escapes Threatens Reliable Computing
Rama Govindaraju
Eric Liu
Subhasish Mitra
Mike Fuller
IEEE (2025) (to appear)
Preview abstract
Summary:
Silent Data Corruption by 10x Test Escapes Threatens Reliable Computing" highlights a critical issue: manufacturing defects, dubbed "test escapes," are evading current testing methods at an alarming rate, ten times higher than industry targets. These defects lead to Silent Data Corruption (SDC), where applications produce incorrect outputs without error indications, costing companies significantly in debugging, data recovery, and service disruptions. The paper proposes a three-pronged approach: quick diagnosis of defective chips directly from system-level behaviors, in-field detection using advanced testing and error detection techniques like CASP, and new, rigorous test experiments to validate these solutions and improve manufacturing testing practices.
View details
ExfilState: Automated Discovery of Timer-Free Cache Side Channels on ARM CPUs
Preview
Fabian Thomas
Michael Torres
Michael Schwarz
ACM Conference on Computer and Communications Security (CCS) (2025) (to appear)
Capturing Real-World Habitual Sleep Patterns with a Novel User-centric Algorithm to Pre-Process Fitbit Data in the All of Us Research Program: Retrospective observational longitudinal study
Hiral Master
Jeffrey Annis
Karla Gleichauf
Lide Han
Peyton Coleman
Kelsie Full
Neil Zheng
Doug Ruderfer
Logan Schneider
Evan Brittain
Journal of Medical Internet Research (2025)
Preview abstract
Background:
Commercial wearables such as Fitbit quantify sleep metrics using fixed calendar times as default measurement periods, which may not adequately account for individual variations in sleep patterns. To address this limitation, experts in sleep medicine and wearable technology developed a user-centric algorithm designed to more accurately reflect actual sleep behaviors and improve the validity of wearable-derived sleep metrics.
Objective:
This study aims to describe the development of a new user-centric algorithm, compare its performance with the default calendar-relative algorithm, and provide a practical guide for analyzing All of Us Fitbit sleep data on a cloud-based platform.
Methods:
The default and user-centric algorithms were implemented to preprocess and compute sleep metrics related to schedule, duration, and disturbances using high-resolution Fitbit sleep data from 8563 participants (median age 58.1 years, 6002/8341, 71.96%, female) in the All of Us Research Program (version 7 Controlled Tier). Variations in typical sleep patterns were calculated by examining the differences in the mean number of primary sleep logs classified by each algorithm. Linear mixed-effects models were used to compare differences in sleep metrics across quartiles of variation in typical sleep patterns.
Results:
Out of 8,452,630 total sleep logs collected over a median of 4.2 years of Fitbit monitoring, 401,777 (4.75%) nonprimary sleep logs identified by the default algorithm were reclassified as primary sleep by the user-centric algorithm. Variation in typical sleep patterns ranged from –0.08 to 1. Among participants with the greatest variation in typical sleep patterns, the user-centric algorithm identified significantly more total sleep time (by 17.6 minutes; P<.001), more wake after sleep onset (by 13.9 minutes; P<.001), and lower sleep efficiency (by 2.0%; P<.001), on average. Differences in sleep stage metrics between the 2 algorithms were modest.
Conclusions:
The user-centric algorithm captures the natural variability in sleep schedules, providing an alternative approach to preprocess and evaluate sleep metrics related to schedule, duration, and disturbances. A publicly available R package facilitates the implementation of this algorithm for clinical and translational research.
View details
HueManity: Probing Fine-Grained Visual Perception in MLLMs
Rynaa Grover
Jayant Tamarapalli
Sahiti Yerramilli
Nilay Pande
(2025)
Preview abstract
Multimodal Large Language Models (MLLMs) excel at high-level visual reasoning, but their performance on nuanced perceptual tasks remains surprisingly limited. We present HueManity, a benchmark designed to assess visual perception in MLLMs. The dataset comprises 83,850 images featuring two-character alphanumeric strings embedded in Ishihara test style dot patterns, challenging models on precise pattern recognition. Our evaluation of nine state-of-the-art MLLMs on HueManity demonstrates a significant performance deficit compared to human and traditional computer vision baselines. The best-performing MLLM achieved a 33.6% accuracy on the numeric "easy" task and a striking 3% on the alphanumeric "hard" task. In contrast, human participants achieved near-perfect scores (100% and 95.6%), and a fine-tuned ResNet50 model reached accuracies of 96.5% and 94.5%. These results highlight a critical gap in the visual capabilities of current MLLMs. Our analysis further explores potential architectural and training-paradigm factors contributing to this perceptual gap in MLLMs. We will open-source HueManity dataset and code to foster further research in improving perceptual robustness of MLLMs.
View details
Study of Arterials in the City of Rio de Janeiro for Traffic Coordination
Ori Rottenstreich
Eliav Buchnik
Danny Veikherman
Dan Karliner
Tom Kalvari
Shai Ferster
Ron Tsibulsky
Jack Haddad
2025
Preview abstract
Urban traffic congestion is a growing challenge, and optimizing signal timing strategies is crucial for improving traffic flow and reducing emissions. The coordination of signalized intersections improves both traffic operations and environmental aspects. Coordination is particularly important along arterials, sequences of signalized intersections that serve as the primary routes and carry a high volume of traffic. In this paper we analyze real data from the city of Rio de Janeiro to study properties of arterials. We refer to their length, the distance between intersections and to the properties of the traffic light plans such as cycle time. We then study their in practice level of coordination in terms of number of stops and their common locations along the arterials. We dive into particular arterials and provide insights that can be useful for efficient design of arterials in additional cities. Based on the analysis, we show how simple traffic properties can indicate the potential upon coordinating two adjacent intersections as part of an arterial in improving traffic performance.
View details
PROTECT: A Framework to Foster Digital Resilience for Youth Navigating Technology-Facilitated Abuse
Diana Freed
Natalie Bazarova
Dan Cosley
Patrick Gage Kelley
Social Sciences Journal, 14(6) (2025)
Preview abstract
Youth are increasingly exposed to a broad range of technology-facilitated abuse that challenges their safety and well-being. Building on previous work that examined youth help-seeking behaviors, coping strategies, threats they encounter, and the social support systems around them, we articulate a framework— called PROTECT—Problem recognition, Reaching out, Organizing support, Training, Engaging experts, Continuous support, and Tackling safety measures—which integrates existing models of support, help-seeking, and digital skills to offer a high-level, structured approach to adults who serve as a support system to youth navigate technology-facilitated abuse. The framework unpacks social and contextual dynamics that influence help-seeking behaviors, providing a foundation for educators, advocates, health professionals, developers and other adult stakeholders to design and develop trauma-informed, timely interventions to promote resilience.
View details
User-Centered Delivery of AI-Powered Health Care Technologies in Clinical Settings: Mixed Methods Case Study
Randall Brandt
Hien Brown
Christine Silva
JMIR Human Factors (2025)
Preview abstract
Background:
Providers spend a large percentage of their day using electronic health record (EHR) technology and frequently report frustration when EHR tasks are time-consuming and effortful. To solve these challenges, artificial intelligence (AI)–based enhancements to EHR technology are increasingly being deployed. However, AI-based implementations for EHR features often lack user-centered evaluation.
Objective:
This study evaluates, using a user-centered approach, the implementation of an AI-powered search and clinical discovery tool within an EHR system.
Methods:
We conducted a mixed methods study consisting of interviews, observations, and surveys for 5 months.
Results:
High adoption rates for the AI-based features (163/176, 93% users after 3 months) and significant increases across key metrics, including user satisfaction (U=49; P<.001) and perception of time saved (U=49; P<.001), demonstrated that the AI-based features were not only successfully integrated into various clinical workflows but also improved the user experience for clinicians.
Conclusions:
Our results underscore the feasibility and effectiveness of using a user-centered approach for the deployment of clinical AI tools. High adoption rates and positive user experiences were driven by our user-centered research program, which emphasized close collaboration with users, rapid incorporation of feedback, and tailored user training. This study program can be used as a starting framework for the design and integration of human-centered research methods for AI tool deployment in clinical settings.
View details
Dynamical-generative downscaling of climate model ensembles
Tapio Schneider
John Anderson
Proceedings of the National Academy of Sciences, 122 (2025), e2420288122
Preview abstract
Regional high-resolution climate projections are crucial for many applications, such as agriculture, hydrology, and natural hazard risk assessment. Dynamical downscaling, the state-of-the-art method to produce localized future climate information, involves running a regional climate model (RCM) driven by an Earth System Model (ESM), but it is too computationally expensive to apply to large climate projection ensembles. We propose an approach combining dynamical downscaling with generative AI to reduce the cost and improve the uncertainty estimates of downscaled climate projections. In our framework, an RCM dynamically downscales ESM output to an intermediate resolution, followed by a generative diffusion model that further refines the resolution to the target scale. This approach leverages the generalizability of physics-based models and the sampling efficiency of diffusion models, enabling the downscaling of large multimodel ensembles. We evaluate our method against dynamically downscaled climate projections from the Coupled Model Intercomparison Project 6 (CMIP6) ensemble. Our results demonstrate its ability to provide more accurate uncertainty bounds on future regional climate than alternatives such as dynamical downscaling of smaller ensembles, or traditional empirical statistical downscaling methods. We also show that dynamical-generative downscaling results in significantly lower errors than popular statistical downscaling techniques, and captures more accurately the spectra, tail dependence, and multivariate correlations of meteorological fields. These characteristics make the dynamical-generative framework a flexible, accurate, and efficient way to downscale large ensembles of climate projections, currently out of reach for pure dynamical downscaling.
View details
Preview abstract
Intuitively, the more complex a software system is, the harder it is to maintain. Statistically, it is not clear which complexity measures correlate with maintenance effort; in fact, it is not even clear how to objectively measure maintenance burden, so that developers’ sentiment and intuition can be supported by numbers. Without effective complexity and maintenance measures, it remains difficult to objectively monitor maintenance, control complexity, or justify refactoring. In this paper, we report a large-scale study of 1200+ projects written in C++ and Java from Google LLC. In this study, we collected three categories of measures: (1) architectural complexity, measured using propagation cost (PC), decoupling level (DL), and structural anti-patterns; (2) maintenance activity, measured using the number of changes, lines of code (LOC) written, and active coding time (ACT) spent on feature-addition vs. bug-fixing, and (3) developer sentiment on complexity and productivity, collected from 7200 survey responses. We statistically analysed the correlations among these measures and obtained significant evidence of the following findings: 1) the more complex the architecture is (higher propagation cost, more instances of anti-patterns), the more LOC is spent on bug-fixing, rather than adding new features; 2) developers who commit more changes for features, spend more lines of code on features, or spend more time on features also feel that they are less hindered by technical debt and complexity. To the best of our knowledge, this is the first large-scale empirical study establishing the statistical correlation among architectural complexity, maintenance activity, and developer sentiment. The implication is that, instead of solely relying upon developer sentiment and intuitions to detect degraded structure or increased burden to evolve, it is possible to objectively and continuously measure and monitor architectural complexity and maintenance difficulty, increasing feature delivery efficiency by reducing architectural complexity and anti-patterns.
View details
On the relationship of speed limit and CO2 emissions in urban traffic
Tamás Tettamanti
Balázs Varga
Ori Rottenstreich
Transportation Research Interdisciplinary Perspectives, 32 (2025)
Preview abstract
The paper analyzes the relationship between urban speed limits and vehicle emissions. There is an ongoing trend of reducing speed limits from to for the sake of increasing road safety. However, the impact of this policy on emissions is still unclear. It can be mixed depending on the proportion of dynamic and steady-state driving. While cruising emissions are higher at lower speeds, lower speeds entail less acceleration in urban traffic. Based on our investigation, one network topology feature (road length) and two traffic-related parameters (traffic volume and turning ratio) have been suggested for analysis being the most relevant to affect vehicle emission. Their correlation with potential emission reduction was evaluated using high-fidelity traffic simulation based on traffic scenarios validated with real traffic data. Random forest regression was used to support the optimal selection of zones for speed limit reduction. Traffic simulations on large urban networks prove that emission reductions of over 10% can be achieved in the case of a well-chosen speed limit policy.
View details
AI as a Catalyst for Educational Equity: Addressing Global Teacher Shortages and Learning Disparities
International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCERT) (2025)
Preview abstract
The global education system is grappling with a critical shortage of teachers, threatening the achievement of universal quality education. This article examines how artificial intelligence (AI) technologies can revolutionize educational access and equity by addressing these systemic challenges. Through a comprehensive article analysis of AI-enabled solutions, including personalized learning mechanisms, virtual tutoring systems, and intelligent content distribution platforms, the article explores the transformative potential of these technologies in democratizing education. The article investigates the implementation of AI across established educational platforms, examining their effectiveness in providing adaptive learning experiences, breaking down language barriers, and ensuring cultural relevance. The article demonstrates that strategic AI integration can significantly impact learning outcomes while helping to bridge the global teacher shortage gap. The article also addresses critical implementation challenges, providing policy recommendations and resource allocation frameworks for successful AI adoption in education systems worldwide. This article analysis contributes to the growing body of knowledge on educational technology by offering practical insights into how AI can be leveraged to create more inclusive, effective, and accessible learning environments, ultimately advancing the goal of quality education for all.
View details
How Unique is Whose Web Browser? The role of demographics in browser fingerprinting
Pritish Kamath
Robin Lassonde
2025
Preview abstract
Web browser fingerprinting can be used to identify and track users across the Web, even without cookies, by collecting attributes from users' devices to create unique "fingerprints". This technique and resulting privacy risks have been studied for over a decade. Yet further research is limited because prior studies did not openly publish their data. Additionally, data in prior studies had biases and lacked user demographics.
Here we publish a first-of-its-kind open dataset that includes browser attributes with users' demographics, collected from 8,400 US study participants, with their informed consent. Our data collection process also conducted an experiment to study what impacts users' likelihood to share browser data for open research, in order to inform future data collection efforts, with survey responses from a total of 12,461 participants. Female participants were significantly less likely to share their browser data, as were participants who were shown the browser data we asked to collect.
In addition we demonstrate how fingerprinting risks differ across demographic groups. For example, we find lower income users are more at risk, and find that as users' age increases, they are both more likely to be concerned about fingerprinting and at real risk of fingerprinting. Furthermore, we demonstrate an overlooked risk: user demographics, such as gender, age, income level, ethnicity and race, can be inferred from browser attributes commonly used for fingerprinting, and we identify which browser attributes most contribute to this risk.
Overall, we show the important role of user demographics in the ongoing work that intends to assess fingerprinting risks and improve user privacy, with findings to inform future privacy enhancing browser developments. The dataset and data collection tool we openly publish can be used to further study research questions not addressed in this work.
View details
H2E: Hand, Head, Eye: A Multimodal Cascade of Natural Inputs
Khushman Patel
Hans Gellersen
Ken Pfeuffer
IEEE VR (2025)
Preview abstract
Eye-based interaction techniques for extended reality, such as gaze and pinch, are simple to use however suffer from input precision issues. We present H2E, a fine and coarse-grained pointing technique that cascades Hand, Head, and Eye inputs. As users initiate a pinch gesture, a cursor appears at the gaze point that can be dragged by head pointing before pinch confirmation. This has the potential advantage that it can add a precision component without changing the semantics of the technique. In this paper, we describe the design and implementation of the technique. Furthermore, we present an evaluation of our method in a Fitts-based user study, exploring the speed-accuracy trade-offs against a gaze and pinch interaction baseline.
View details
Development and Evaluation of ML Models for Cardiotocography Interpretation
Nicole Chiou
Nichole Young-Lin
Abdoulaye Diack
Christopher Kelly
Sanmi Koyejo
NPJ Women's Health (2025)
Preview abstract
The inherent variability in the visual interpretation of cardiotocograms (CTGs) by obstetric clinical experts, both intra- and inter-observer, presents a substantial challenge in obstetric care. In response, we investigate automated CTG interpretation as a potential solution to enhance the early detection of fetal hypoxia during labor, thereby reducing unnecessary operative interventions and improving overall maternal and neonatal care. This study employs deep learning techniques to reduce the subjectivity associated with visual CTG interpretation. Our results demonstrate that employing objective cord blood pH measurements, rather than clinician-defined Apgar scores, yields more consistent and robust model performance. Additionally, through a series of ablation studies, we investigate the impact of temporal distribution shifts on the performance of these deep learning models. We examine tradeoffs between performance and fairness, specifically evaluating performance across demographic and clinical subgroups. Finally, we discuss the practical implications of our findings for the real-world deployment of such systems, emphasizing their potential utility in medical settings with limited resources.
View details
Sufficient Context: A New Lens on Retrieval Augmented Generation Systems
Hailey Joren
Jianyi Zhang
Chun-Sung Ferng
Ankur Taly
International Conference on Learning Representations (ICLR) (2025)
Preview abstract
Augmenting LLMs with context leads to improved performance across many applications. Despite much research on Retrieval Augmented Generation (RAG) systems, an open question is whether errors arise because LLMs fail to utilize the context from retrieval or the context itself is insufficient to answer the query. To shed light on this, we develop a new notion of sufficient context, along with a method to classify instances that have enough information to answer the query. We then use sufficient context to analyze several models and datasets. By stratifying errors based on context sufficiency, we find that larger models with higher baseline performance (Gemini 1.5 Pro, GPT 4o, Claude 3.5) excel at answering queries when the context is sufficient, but often output incorrect answers instead of abstaining when the context is not. On the other hand, smaller models with lower baseline performance (Llama 3.1, Mistral 3, Gemma 2) hallucinate or abstain often, even with sufficient context. We further categorize cases when the context is useful, and improves accuracy, even though it does not fully answer the query and the model errs without the context. Building on our findings, we explore ways to reduce hallucinations in RAG systems, including a new selective generation method that leverages sufficient context information for guided abstention. Our method improves the fraction of correct answers among times where the model responds by 2--10% for Gemini, GPT, and Gemma.
View details