Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10464 publications
Preview abstract
We introduce efficient differentially private (DP) algorithms for several linear algebraic tasks, including solving linear equalities over arbitrary fields, linear inequalities over the reals, and computing affine spans and convex hulls. As an application, we obtain efficient DP algorithms for learning halfspaces and affine subspaces. Our algorithms addressing equalities are strongly polynomial, whereas those addressing inequalities are weakly polynomial. Furthermore, this distinction is inevitable: no DP algorithm for linear programming can be strongly polynomial-time efficient.
View details
Ransomware over Modern Web Browsers: A Novel Strain and A New Defense Mechanism
Harun Oz
Ahmet Aris
Leonardo Babun
Selcuk Uluagac
Abbas Acar
ACM Transactions on the Web (2025)
Preview abstract
Ransomware is an increasingly prevalent form of malware targeting end-users, governments, and businesses. As it has evolved,
adversaries added new capabilities to their arsenal. Throughout the ransomware evolution, the adversaries propose a next-generation
browser-based ransomware, RøB, that performs its malicious actions via emerging web technologies, File System Access API (FSA) and
WebAssembly (Wasm). RøB uses this API through the victims’ browsers; hence, it does not require the victims to download and install
malicious binaries. We performed extensive evaluations with 3 different OSs, 23 file formats, 29 distinct directories, 5 cloud providers,
and 4 antivirus solutions. Our evaluations show that RøB can encrypt various types of files in the local and cloud-integrated directories,
external storage devices, and network-shared folders of victims. Our experiments also reveal that popular cloud solutions, Box
Individual and Apple iCloud can be severely affected by RøB. Moreover, we conducted tests with commercial antivirus software such
as AVG, Avast, Kaspersky, Malware Bytes that perform sensitive directory and suspicious behavior monitoring against ransomware.
We verified that RøB can evade these antivirus software and encrypt victim files. Moreover, existing ransomware detection solutions
in the literature also cannot be a remedy against RøB due to its distinct features. Therefore, in this paper, we also propose broguard,
a new detection system for RøB-like attacks. broguard monitors the web applications that use the FSA API via function hooking and
uses a machine learning classifier to detect RøB-like attacks in real-time without any file loss. Performance evaluations of broguard
on a comprehensive dataset show that broguard can detect RøB-like browser-based ransomware attacks with over 99% accuracy and
minimal overhead.
View details
Silent Data Corruption by 10× Test Escapes Threatens Reliable Computing
Subhasish Mitra
Rama Govindaraju
Eric Liu
IEEE Design & Test (2025) (to appear)
Preview abstract
Too many defective compute chips are escaping today’s manufacturing tests – at least an order of magnitude more than industrial targets across all compute chip types in data centers. Silent data corruptions (SDCs) caused by test escapes, when left unaddressed, pose a major threat to reliable computing. We present a three-pronged approach outlining future directions for overcoming test escapes: (a) Quick diagnosis of defective chips directly from system-level incorrect behaviors. Such diagnosis is critical for gaining insights into why so many defective chips escape existing manufacturing testing. (b) In-field detection of defective chips. (c) New test experiments to understand the effectiveness of new techniques for detecting defective chips. These experiments must overcome the drawbacks and pitfalls of previous industrial test experiments and case studies.
View details
Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques
Shashank Kapoor
Aman Raj
IEEE Compsac 2025 (2025)
Preview abstract
Large Language Models (LLMs) are revolutionizing many areas of AI, but their substantial resource requirements limit their deployment on mobile and edge devices. This survey paper provides a comprehensive overview of techniques for compressing LLMs to enable efficient inference in resource-constrained environments. We examine three primary approaches: knowledge distillation, model quantization and model pruning. For each technique, we discuss the underlying principles, present different forms, and provide examples of successful applications. We also briefly discuss complementary techniques like mixture-of-experts and early exit strategies and highlight the promising future directions. We aim to provide a valuable resource for both researchers and practitioners seeking to optimize LLMs for edge deployment. To the best of our knowledge, this is the first paper that provides a focused survey of LLM compression techniques from the lens of resource-constrained environments.
View details
LeakyFeeder: In-Air Gesture Control Through Leaky Acoustic Waves
Yongjie Yang
Tao Chen
Zhenlin An
Shirui Cao
Shangguan Longfei
SenSys 2025 - The 23rd ACM Conference on Embedded Networked Sensor Systems (2025)
Preview abstract
We present LeekyFeeder, a mobile application that explores the acoustic signals leaked from headphones to reconstruct gesture motions around the ear for fine-grained gesture control. To achieve this goal, LeekyFeeder reuses the speaker and feed-forward microphones on active noise cancellation (ANC) headphones as a SONAR system, emitting an inaudible frequency-modulated continuous-wave (FMCW) signal to track gesture reflections over time. Since this single-receiver SONAR system is unable to differentiate reflection angles and further disentangle signal reflections from different gesture parts, we draw on principles of multi-modal learning to frame gesture motion reconstruction as a multi-modal translation task and propose a deep learning-based approach to fill the information gap between low-dimensional FMCW ranging readings and high-dimensional 3D hand movements. We implement LeekyFeeder on a pair of Google Pixel Buds and conduct experiments to examine the efficacy and robustness of LeekyFeeder in various conditions. Experiments based on six gesture types inspired by Apple Vision Pro demonstrate that LeekyFeeder achieves a PCK performance of 89% at 3cm across ten users, with an average MPJPE and MPJRPE error of 2.71cm and 1.88cm, respectively.
View details
Security Signals: Making Web Security Posture Measurable At Scale
David Dworken
Artur Janc
Santiago (Sal) Díaz
Workshop on Measurements, Attacks, and Defenses for the Web (MADWeb)
Preview abstract
The area of security measurability is gaining increased attention, with a wide range of organizations calling for the development of scalable approaches for assessing the security of software systems and infrastructure. In this paper, we present our experience developing Security Signals, a comprehensive system providing security measurability for web services, deployed in a complex application ecosystem of thousands of web services handling traffic from billions of users. The system collects security-relevant information from production HTTP traffic at the reverse proxy layer, utilizing novel concepts such as synthetic signals augmented with additional risk information to provide a holistic view of the security posture of individual services and the broader application ecosystem. This approach to measurability has enabled large-scale security improvements to our services, including prioritized rollouts of security enhancements and the implementation of automated regression monitoring. Furthermore, it has proven valuable for security research and prioritization of defensive work. Security Signals addresses shortcomings of prior web measurability proposals by tracking a comprehensive set of security properties relevant to web applications, and by extracting insights from collected data for use by both security experts and non-experts. We believe the lessons learned from the implementation and use of Security Signals offer valuable insights for practitioners responsible for web service security, potentially inspiring new approaches to web security measurability.
View details
Preview abstract
Hashing is a fundamental operation in various computer sci-
ence applications. Despite the prevalence of specific key
formats like social security numbers, MAC addresses, plate
numbers, and URLs, hashing libraries typically treat them as
general byte sequences. This paper introduces a technique
for synthesizing specialized hash functions tailored to par-
ticular byte formats. The proposed code generation method
leverages three prevalent patterns: (i) fixed-length keys, (ii)
keys with common subsequences, and (iii) keys ranging on
predetermined sequences of bytes. The code generation pro-
cess involves two algorithms: one identifies relevant regular
expressions within key examples, and the other generates
specialized hash functions based on these expressions. This
approach, straightforward to implement, showcases improve-
ments over highly optimized hash function implementations.
Comparative analysis demonstrates that our synthetic func-
tions outperform counterparts in the C++ Standard Template
Library and the Google Abseil Library, achieving speedups
ranging from 2% to 11%, depending on the key format.
View details
PageFlex: Flexible and Efficient User-space Delegation of Linux Paging Policies with eBPF
Kan Wu
Zhiyuan Guo
Suli Yang
Rajath Shashidhara
Wei Xu
Alex Snoeren
Kim Keeton
2025
Preview abstract
To increase platform memory efficiency, hyperscalers like Google and Meta transparently demote “cold” application data to cheaper cost-per-byte memory tiers like compressed memory and NVMe SSDs. These systems rely on standard kernel paging policies and mechanisms to maximize the achievable memory savings without hurting application performance. Although the literature promises better policies, implementing and deploying them within the Linux kernel
is challenging. Delegating policies and mechanisms to user space, through userfaultfd or library-based approaches, incurs overheads and may require modifying application code. We present PageFlex, a framework for delegating Linux paging policies to user space with minimal overhead and full compatibility with existing real-world deployments. PageFlex uses eBPF to delegate policy decisions while providing low-overhead access to in-kernel memory state and access information, thus balancing flexibility and performance. Additionally, PageFlex supports different paging strategies for distinct memory regions and application phases. We show that
PageFlex can delegate existing kernel-based policies with little (< 1%) application slowdown, effectively realizing the benefits of state-of-the-art policies like Hyperbolic caching and Leap prefetching, and unlocking application-specific benefits through region- and phase-aware policy specialization.
View details
Preview abstract
Browser fingerprinting is an online tracking technique that is being increasingly adopted for profiling and ad targeting purposes. While prior work has analyzed the prevalence and impact of browser fingerprinting on the Web, they have traditionally relied on large-scale automated crawls. Naturally, these cannot replicate real-human interactions, e.g., solve CAPTCHAs, evade bot detectors, or operate behind login pages and paywalls. This prompts the question as to whether or not the fingerprinting ecosystem is appreciably different in real-world browsing sessions. In this paper, we begin to address this question by designing and conducting a user study aimed at collecting actual telemetry data from real browsing sessions of 30 users.
We find that almost half of the fingerprinting websites identified from real user browsing sessions are missed by equivalent automated crawls. This is mainly due to the inability of automated crawls to identify and visit authentication pages, being blocked by bot detectors, and/or failing to perform user interactions that specifically trigger browser fingerprinting scripts. We also find new fingerprinting vectors that are consistently present in fingerprinting scripts captured by real user browsing sessions yet missing from automated crawls. Finally, we assess the feasibility of collecting fingerprinting training data in a privacy-preserving way. We conclude that private models built on real user browsing sessions can detect browser fingerprinting more effectively than models trained on automated crawls alone, while simultaneously providing strong privacy guarantees to users.
View details
Preview abstract
In this article, we describe our human-centered research focused on understanding the role of collaboration and teamwork in productive software development. We describe creation of a logs-based metric to identify collaboration through observable events and a survey-based multi-item scale to assess team functioning.
View details
Origin-destination travel demand estimation: an approach that scales worldwide, and its application to five metropolitan highway networks
Christopher Bian
Yechen Li
Willa Ng
Bin Yan
Janny Zhang
Transportation Research Part B: Methodological (2025) (to appear)
Preview abstract
Estimating Origin-Destination (OD) travel demand is vital for effective urban planning
and traffic management. Developing universally applicable OD estimation
methodologies is significantly challenged by the pervasive scarcity of high-fidelity traffic
data and the difficulty in obtaining city-specific prior OD estimates (or seed ODs), which
are often prerequisite for traditional approaches. Our proposed method directly
estimates OD travel demand by systematically leveraging aggregated, anonymized
statistics from Google Maps Traffic Trends, obviating the need for conventional census
or city-provided OD data. The OD demand is estimated by formulating a single-level,
one-dimensional, continuous nonlinear optimization problem with nonlinear equality
and bound constraints to replicate highway path travel times. The method achieves
efficiency and scalability by employing a differentiable analytical macroscopic network
model. This model by design is computationally lightweight, distinguished by its
parsimonious parameterization that requires minimal calibration effort and its capacity
for instantaneous evaluation. These attributes ensure the method's broad applicability
and practical utility across diverse cities globally. Using segment sensor counts from
Los Angeles and San Diego highway networks, we validate our proposed approach,
demonstrating a two-thirds to three-quarters improvement in the fit to segment count
data over a baseline. Beyond validation, we establish the method's scalability and
robust performance in replicating path travel times across diverse highway networks,
including Seattle, Orlando, Denver, Philadelphia, and Boston. In these expanded
evaluations, our method not only aligns with simulation-based benchmarks but also
achieves an average 13% improvement in it's ability to fit travel time data compared to
the baseline during afternoon peak hours.
View details
Scaling Wearable Foundation Models
Girish Narayanswamy
Kumar Ayush
Yuzhe Yang
Orson Xu
Shun Liao
Shyam Tailor
Jake Sunshine
Tim Althoff
Shrikanth (Shri) Narayanan
Jiening Zhan
Mark Malhotra
Shwetak Patel
Samy Abdel-Ghaffar
Daniel McDuff
2025
Preview abstract
Wearable sensors have become ubiquitous thanks to a variety of health tracking features. The resulting continuous and longitudinal measurements from everyday life generate large volumes of data. However, making sense of these observations for scientific and actionable insights is non-trivial. Inspired by the empirical success of generative modeling, where large neural networks learn powerful representations from vast amounts of text, image, video, or audio data, we investigate the scaling properties of wearable sensor foundation models across compute, data, and model size. Using a dataset of up to 40 million hours of in-situ heart rate, heart rate variability, accelerometer, electrodermal activity, skin temperature, and altimeter per-minute data from over 165,000 people, we create LSM, a multimodal foundation model built on the largest wearable-signals dataset with the most extensive range of sensor modalities to date. Our results establish the scaling laws of LSM for tasks such as imputation, interpolation and extrapolation across both time and sensor modalities. Moreover, we highlight how LSM enables sample-efficient downstream learning for tasks including exercise and activity recognition.
View details
Preview abstract
This paper presents a novel framework for optimizing capacitor selection in electronic design using multi-objective linear and non-linear constrained optimization techniques. We demonstrate the effectiveness of this approach in minimizing cost and board area while meeting critical performance requirements.
View details
LLM-based Lossless Text Simplification and its Effect on User Comprehension and Cognitive Load
Theo Guidroz
Diego Ardila
Jimmy Li
Adam Mansour
Paul Jhun
Nina Gonzalez
Xiang Ji
Mike Sanchez
Miguel Ángel Garrido
Faruk Ahmed
Divyansh Choudhary
Jay Hartford
Georgina Xu
Henry Serrano
Yifan Wang
Jeff Shaffer
Eric (Yifan) Cao
Sho Fujiwara
Peggy Bui
arXiv (2025)
Preview abstract
Information on the web, such as scientific publications and Wikipedia, often surpasses users' reading level. To help address this, we used a self-refinement approach to develop a LLM capability for minimally lossy text simplification. To validate our approach, we conducted a randomized study involving 4563 participants and 31 texts spanning 6 broad subject areas: PubMed (biomedical scientific articles), biology, law, finance, literature/philosophy, and aerospace/computer science. Participants were randomized to viewing original or simplified texts in a subject area, and answered multiple-choice questions (MCQs) that tested their comprehension of the text. The participants were also asked to provide qualitative feedback such as task difficulty. Our results indicate that participants who read the simplified text answered more MCQs correctly than their counterparts who read the original text (3.9% absolute increase, p<0.05). This gain was most striking with PubMed (14.6%), while more moderate gains were observed for finance (5.5%), aerospace/computer science (3.8%) domains, and legal (3.5%). Notably, the results were robust to whether participants could refer back to the text while answering MCQs. The absolute accuracy decreased by up to ~9% for both original and simplified setups where participants could not refer back to the text, but the ~4% overall improvement persisted. Finally, participants' self-reported perceived ease based on a simplified NASA Task Load Index was greater for those who read the simplified text (absolute change on a 5-point scale 0.33, p<0.05). This randomized study, involving an order of magnitude more participants than prior works, demonstrates the potential of LLMs to make complex information easier to understand. Our work aims to enable a broader audience to better learn and make use of expert knowledge available on the web, improving information accessibility.
View details
Capturing Real-World Habitual Sleep Patterns with a Novel User-centric Algorithm to Pre-Process Fitbit Data in the All of Us Research Program: Retrospective observational longitudinal study
Hiral Master
Jeffrey Annis
Karla Gleichauf
Lide Han
Peyton Coleman
Kelsie Full
Neil Zheng
Doug Ruderfer
Logan Schneider
Evan Brittain
Journal of Medical Internet Research (2025)
Preview abstract
Background:
Commercial wearables such as Fitbit quantify sleep metrics using fixed calendar times as default measurement periods, which may not adequately account for individual variations in sleep patterns. To address this limitation, experts in sleep medicine and wearable technology developed a user-centric algorithm designed to more accurately reflect actual sleep behaviors and improve the validity of wearable-derived sleep metrics.
Objective:
This study aims to describe the development of a new user-centric algorithm, compare its performance with the default calendar-relative algorithm, and provide a practical guide for analyzing All of Us Fitbit sleep data on a cloud-based platform.
Methods:
The default and user-centric algorithms were implemented to preprocess and compute sleep metrics related to schedule, duration, and disturbances using high-resolution Fitbit sleep data from 8563 participants (median age 58.1 years, 6002/8341, 71.96%, female) in the All of Us Research Program (version 7 Controlled Tier). Variations in typical sleep patterns were calculated by examining the differences in the mean number of primary sleep logs classified by each algorithm. Linear mixed-effects models were used to compare differences in sleep metrics across quartiles of variation in typical sleep patterns.
Results:
Out of 8,452,630 total sleep logs collected over a median of 4.2 years of Fitbit monitoring, 401,777 (4.75%) nonprimary sleep logs identified by the default algorithm were reclassified as primary sleep by the user-centric algorithm. Variation in typical sleep patterns ranged from –0.08 to 1. Among participants with the greatest variation in typical sleep patterns, the user-centric algorithm identified significantly more total sleep time (by 17.6 minutes; P<.001), more wake after sleep onset (by 13.9 minutes; P<.001), and lower sleep efficiency (by 2.0%; P<.001), on average. Differences in sleep stage metrics between the 2 algorithms were modest.
Conclusions:
The user-centric algorithm captures the natural variability in sleep schedules, providing an alternative approach to preprocess and evaluate sleep metrics related to schedule, duration, and disturbances. A publicly available R package facilitates the implementation of this algorithm for clinical and translational research.
View details