Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 10381 publications
    Preview abstract This IEEE Spectrum article reflects on advocacy for U.S. technological leadership during my Congressional visit through IEEE-USA. Leading an expert group of other distinguished IEEE members, we urged lawmakers to support critical initiatives. Key priorities included sustained funding for federal research institutions like NIST, NASA, and the NSF, reauthorizing the SBIR/STTR programs vital for small business innovation, and passing the CREATE AI Act to democratize AI resources by establishing the National AI Research Resource (NAIRR). We also emphasized strengthening the STEM talent pipeline through the CHIPS and Science Act and expanding high-skilled immigrant visas. We highlighted rapid AI advancements, such as autonomous vehicles, the surge in FDA-approved AI based medical devices, as underscoring the need for these strategic investments and policy actions. The article conveys a sense of urgency, calling for concrete congressional action to ensure the U.S. maintains its technological edge while also sharing my personal experiences. View details
    Preview abstract The global adoption of Large Language Models (LLMs) in healthcare shows promise for enhancing clinical workflows and improving patient outcomes. However, Automatic Speech Recognition (ASR) errors in critical medical entities remain a significant challenge. These errors can lead to severe consequences if undetected. This study investigates the prevalence and impact of ASR errors in medical transcription across Africa, Europe, and North America. By examining variations in accented English across three continents, we analyze the impact of regional speech patterns on ASR performance. Our research quantifies both the potential and limitations of LLMs in mitigating ASR inaccuracies within various medical settings, with particular attention to performance variations across regional accents and medical terminology. Our findings highlight significant disparities in ASR accuracy across regions and identify specific conditions under which LLM corrections prove most effective. View details
    Preview abstract Recent work suggested utilizing inference compute, showing that scaling of number of samples consistently improves the fractions of problems solved by any attempt, namely the coverage. In this work, we suggest that inference scaling gains should be compared with proper baselines, as some datasets become degenerate when allowing a large number of attempts. We focus on two domains - mathematical reasoning and factual knowledge, showing that for the MATH and Entity Questions datasets, informed answer enumeration obtains similar or even better results than repeated model sampling, with a much lower sample budget. While we believe that inference scaling is a promising approach for unlocking the potential of language models, we recommend carefully selecting models and datasets when applying this method. Otherwise, the results of inference scaling should be interpreted with caution. View details
    Preview abstract Storage on Android has evolved significantly over the years, with each new Android version introducing changes aimed at enhancing usability, security, and privacy. While these updates typically help with restricting app access to storage through various mechanisms, they may occasionally introduce new complexities and vulnerabilities. A prime example is the introduction of scoped storage in Android 10, which fundamentally changed how apps interact with files. While intended to enhance user privacy by limiting broad access to shared storage, scoped storage has also presented developers with new challenges and potential vulnerabilities to address. However, despite its significance for user privacy and app functionality, no systematic studies have been performed to study Android’s scoped storage at depth from a security perspective. In this paper, we present the first systematic security analysis of the scoped storage mechanism. To this end, we design and implement a testing tool, named ScopeVerif, that relies on differential analysis to uncover security issues and implementation inconsistencies in Android’s storage. Specifically, ScopeVerif takes a list of security properties and checks if there are any file operations that violate any security properties defined in the official Android documentation. Additionally, we conduct a comprehensive analysis across different Android versions as well as a cross-OEM analysis to identify discrepancies in different implementations and their security implications. Our study identifies both known and unknown issues of scoped storage. Our cross-version analysis highlights undocumented changes as well as partially fixed security loopholes across versions. Additionally, we discovered several vulnerabilities in scoped storage implementations by different OEMs. These vulnerabilities stem from deviations from the documented and correct behavior, which potentially poses security risks. The affected OEMs and Google have acknowledged our findings and offered us bug bounties in response. View details
    Preview abstract Modern deep learning algorithms use variations of gradient descent as their main learning methods. Gradient descent can be understood as the simplest Ordinary Differential Equation (ODE) solver; namely, the Euler method applied to the gradient flow differential equation. Since Euler, many ODE solvers have been devised that follow the gradient flow equation more precisely and more stably. Runge-Kutta (RK) methods provide a family of very powerful explicit and implicit high-order ODE solvers. However, these higher-order solvers have not found wide application in deep learning so far. In this work, we evaluate the performance of higher-order RK solvers when applied in deep learning, study their limitations, and propose ways to overcome these drawbacks. In particular, we explore how to improve their performance by naturally incorporating key ingredients of modern neural network optimizers such as preconditioning, adaptive learning rates, and momentum. View details
    Preview abstract Artificial Intelligence (AI) is rapidly expanding and integrating more into daily life to automate tasks, guide decision-making and enhance efficiency. However, complex AI models, which make decisions without providing clear explanations (known as the "black-box problem"), currently restrict trust and widespread adoption of AI. Explainable Artificial intelligence (XAI) has emerged to address the black-box problem of making AI systems more interpretable and transparent so stakeholders can trust, verify, and act upon AI-based outcomes. Researcher have come up with various techniques to foster XAI in Software Development Lifecycle. However, there are gaps in the application of XAI in Software Engineering phases. Literature shows that 68% of XAI in Software Engineering research focused on maintenance as opposed to 8% on software management and requirements [7]. In this paper we present a comprehensive survey of the applications of XAI methods (e.g., concept-based explanations, LIME/SHAP, rule extraction, attention mechanisms, counterfactual explanations, example-based explanations) to the different phases of Software Development Lifecycles (SDLC) mainly requirements elicitation, design and development, testing and deployment, and evolution. To the best of our knowledge, this paper presents the first comprehensive survey of XAI techniques for every phase of the Software Development Life Cycle (SDLC). In doing so, we aim to promote explainable AI in Software Engineering and facilitate the use of complex AI models in AI-driven software development. View details
    Binamix -- A Python Library for Generating Binaural Audio Datasets
    Dan Barry
    Davoud Shariat Panah
    Alessandro Ragano
    Andrew Hines
    AES 158th Audio Engineering Society Convention (2025) (to appear)
    Preview abstract The increasing demand for spatial audio in applications such as virtual reality, immersive media, and spatial audio research necessitates robust solutions to generate binaural audio data sets for use in testing and validation. Binamix is an open-source Python library designed to facilitate programmatic binaural mixing using the extensive SADIE II Database, which provides Head Related Impulse Response (HRIR) and Binaural Room Impulse Response (BRIR) data for 20 subjects. The Binamix library provides a flexible and repeatable framework for creating large-scale spatial audio datasets, making it an invaluable resource for codec evaluation, audio quality metric development, and machine learning model training. A range of pre-built example scripts, utility functions, and visualization plots further streamline the process of custom pipeline creation. This paper presents an overview of the library’s capabilities, including binaural rendering, impulse response interpolation, and multi-track mixing for various speaker layouts. The tools utilize a modified Delaunay triangulation technique to achieve accurate HRIR/BRIR interpolation where desired angles are not present in the data. By supporting a wide range of parameters such as azimuth, elevation, subject Impulse Responses (IRs), speaker layouts, mixing controls, and more, the library enables researchers to create large binaural datasets for any downstream purpose. Binamix empowers researchers and developers to advance spatial audio applications with reproducible methodologies by offering an open-source solution for binaural rendering and dataset generation. We release the library under the Apache 2.0 License at https://github.com/QxLabIreland/Binamix/ View details
    Validation of a Deep Learning Model for Diabetic Retinopathy on Patients with Young-Onset Diabetes
    Tony Tan-Torres
    Pradeep Praveen
    Divleen Jeji
    Arthur Brant
    Xiang Yin
    Lu Yang
    Tayyeba Ali
    Ilana Traynis
    Dushyantsinh Jadeja
    Rajroshan Sawhney
    Sunny Virmani
    Pradeep Venkatesh
    Nikhil Tandon
    Ophthalmology and Therapy (2025)
    Preview abstract Introduction While many deep learning systems (DLSs) for diabetic retinopathy (DR) have been developed and validated on cohorts with an average age of 50s or older, fewer studies have examined younger individuals. This study aimed to understand DLS performance for younger individuals, who tend to display anatomic differences, such as prominent retinal sheen. This sheen can be mistaken for exudates or cotton wool spots, and potentially confound DLSs. Methods This was a prospective cross-sectional cohort study in a “Diabetes of young” clinic in India, enrolling 321 individuals between ages 18 and 45 (98.8% with type 1 diabetes). Participants had fundus photographs taken and the photos were adjudicated by experienced graders to obtain reference DR grades. We defined a younger cohort (age 18–25) and an older cohort (age 26–45) and examined differences in DLS performance between the two cohorts. The main outcome measures were sensitivity and specificity for DR. Results Eye-level sensitivity for moderate-or-worse DR was 97.6% [95% confidence interval (CI) 91.2, 98.2] for the younger cohort and 94.0% [88.8, 98.1] for the older cohort (p = 0.418 for difference). The specificity for moderate-or-worse DR significantly differed between the younger and older cohorts, 97.9% [95.9, 99.3] and 92.1% [87.6, 96.0], respectively (p = 0.008). Similar trends were observed for diabetic macular edema (DME); sensitivity was 79.0% [57.9, 93.6] for the younger cohort and 77.5% [60.8, 90.6] for the older cohort (p = 0.893), whereas specificity was 97.0% [94.5, 99.0] and 92.0% [88.2, 95.5] (p = 0.018). Retinal sheen presence (94% of images) was associated with DME presence (p < 0.0001). Image review suggested that sheen presence confounded reference DME status, increasing noise in the labels and depressing measured sensitivity. The gradability rate for both DR and DME was near-perfect (99% for both). Conclusion DLS-based DR screening performed well in younger individuals aged 18–25, with comparable sensitivity and higher specificity compared to individuals aged 26–45. Sheen presence in this cohort made identification of DME difficult for graders and depressed measured DLS sensitivity; additional studies incorporating optical coherence tomography may improve accuracy of measuring DLS DME sensitivity. View details
    Scaling Wearable Foundation Models
    Girish Narayanswamy
    Kumar Ayush
    Yuzhe Yang
    Orson Xu
    Shun Liao
    Shyam Tailor
    Jake Sunshine
    Tim Althoff
    Shrikanth (Shri) Narayanan
    Jiening Zhan
    Mark Malhotra
    Shwetak Patel
    Samy Abdel-Ghaffar
    Daniel McDuff
    2025
    Preview abstract Wearable sensors have become ubiquitous thanks to a variety of health tracking features. The resulting continuous and longitudinal measurements from everyday life generate large volumes of data. However, making sense of these observations for scientific and actionable insights is non-trivial. Inspired by the empirical success of generative modeling, where large neural networks learn powerful representations from vast amounts of text, image, video, or audio data, we investigate the scaling properties of wearable sensor foundation models across compute, data, and model size. Using a dataset of up to 40 million hours of in-situ heart rate, heart rate variability, accelerometer, electrodermal activity, skin temperature, and altimeter per-minute data from over 165,000 people, we create LSM, a multimodal foundation model built on the largest wearable-signals dataset with the most extensive range of sensor modalities to date. Our results establish the scaling laws of LSM for tasks such as imputation, interpolation and extrapolation across both time and sensor modalities. Moreover, we highlight how LSM enables sample-efficient downstream learning for tasks including exercise and activity recognition. View details
    Preview abstract Users of routing services like Apple Maps, Google Maps, and Waze frequently wonder why a given route is proposed. This question particularly arises when dynamic conditions like traffic and road closures cause unusual routes to be proposed. While many such dynamic conditions may exist in a road network at any time, only a small fraction of those conditions are typically relevant to a given user's route. In this work, we give a simple algorithm that identifies a small set of traffic-laden road segments that answer the following question: Which traffic conditions cause a particular shortest traffic-aware route to differ from the shortest traffic-free route? We theoretically and experimentally show that our algorithm generates small and interpretable answers to this question. View details
    Preview abstract This paper adopts a Usage-Based Construction Grammar perspective to compare human- and AI-generated language, focusing on Verb-Argument Constructions (VACs) as a lens for analysis. Specifically, we examine solicited advice texts in two domains—Finance and Medicine—produced by humans and ChatGPT across different GPT models (3.5, 4, and 4o) and interfaces (3.5 Web vs. 3.5 API). Our findings reveal broad consistency in the frequency and distribution of the most common VACs across human- and AI-generated texts, though ChatGPT exhibits a slightly higher reliance on the most frequent constructions. A closer examination of the verbs occupying these constructions uncovers significant differences in the meanings conveyed, with a notable growth away from human-like language production in macro level perspectives (e.g., length) and towards humanlike verb-VAC patterns with newer models. These results underscore the potential of VACs as a powerful tool for analyzing AI-generated language and tracking its evolution over time. View details
    Preview abstract Large Language Models (LLMs) are revolutionizing many areas of AI, but their substantial resource requirements limit their deployment on mobile and edge devices. This survey paper provides a comprehensive overview of techniques for compressing LLMs to enable efficient inference in resource-constrained environments. We examine three primary approaches: knowledge distillation, model quantization and model pruning. For each technique, we discuss the underlying principles, present different forms, and provide examples of successful applications. We also briefly discuss complementary techniques like mixture-of-experts and early exit strategies and highlight the promising future directions. We aim to provide a valuable resource for both researchers and practitioners seeking to optimize LLMs for edge deployment. To the best of our knowledge, this is the first paper that provides a focused survey of LLM compression techniques from the lens of resource-constrained environments. View details
    Passive Heart Rate Monitoring During Smartphone Use in Everyday Life
    Shun Liao
    Paolo Di Achille
    Jiang Wu
    Silviu Borac
    Jonathan Wang
    Eric Teasley
    Lawrence Cai
    Daniel McDuff
    Hao-Wei Su
    Brent Winslow
    Anupam Pathak
    Shwetak Patel
    Jim Taylor
    Jamie Rogers
    (2025)
    Preview abstract Resting heart rate (RHR) is an important biomarker of cardiovascular health and mortality, but tracking it longitudinally generally requires a wearable device, limiting its availability. We present PHRM, a deep learning system for passive heart rate (HR) and RHR measurements during ordinary smartphone use, using facial video-based photoplethysmography. Our system was developed using 225,773 videos from 495 participants and validated on 185,970 videos from 205 participants in laboratory and free-living conditions – the largest validation study of its kind. Compared to reference electrocardiogram, PHRM achieved a mean absolute percentage error (MAPE) <10% for HR measurements across three skin tone groups of light, medium and dark pigmentation; MAPE for each skin tone group was non-inferior versus the others. Daily RHR measured by PHRM had a mean absolute error <5 bpm compared to a wearable HR tracker, and was associated with known risk factors. These results highlight the potential of smartphones to enable passive and equitable heart health monitoring. View details
    Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces
    Mauro Comi
    Max Yang
    Jonathan Tremblay
    Valts Blukis
    Yijiong Lin
    Nathan Lepora
    Laurence Aitchison
    2025
    Preview abstract Touch and vision go hand in hand, mutually enhancing our ability to understand the world. From a research perspective, the problem of mixing touch and vision is underexplored and presents interesting challenges. To this end, we propose Tactile-Informed 3DGS, a novel approach that incorporates touch data (local depth maps) with multi-view vision data to achieve surface reconstruction and novel view synthesis. Our method optimises 3D Gaussian primitives to accurately model the object's geometry at points of contact. By creating a framework that decreases the transmittance at touch locations, we achieve a refined surface reconstruction, ensuring a uniformly smooth depth map. Touch is particularly useful when considering non-Lambertian objects (e.g. shiny or reflective surfaces) since contemporary methods tend to fail to reconstruct with fidelity specular highlights. By combining vision and tactile sensing, we achieve more accurate geometry reconstructions with fewer images than prior methods. We conduct evaluation on objects with glossy and reflective surfaces and demonstrate the effectiveness of our approach, offering significant improvements in reconstruction quality. View details
    Preview abstract Many AI applications of interest require specialized multi-modal models. Yet, relevant data for training these models is inherently scarce. Human annotation is prohibitively expensive, error-prone, and time-consuming. Meanwhile, existing synthetic data generation methods often rely on manual prompts, evolutionary algorithms, or extensive seed data from the target distribution - limiting scalability and control. In this paper, we introduce Simula, a novel, seedless framework that balances global and local reasoning to generate synthetic datasets. We utilize taxonomies to capture a global coverage space and use a series of agentic refinements to promote local diversity and complexity. Our approach allows users to define desired dataset characteristics through an explainable and controllable process, without relying on seed data. This unlocks new opportunities for developing and deploying AI in domains where data scarcity or privacy concerns are paramount. View details