Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 10406 publications
    Preview abstract Large Language Models (LLMs) have demonstrated impressive capabilities across a range of natural language processing tasks. In particular, improvements in reasoning abilities and the expansion of context windows have opened new avenues for leveraging these powerful models. NL2SQL is challenging in that the natural language question is inherently ambiguous, while the SQL generation requires a precise understanding of complex data schema and semantics. One approach to this semantic ambiguous problem is to provide more and sufficient contextual information. In this work, we explore the performance and the latency trade-offs of the extended context window (a.k.a., long context) offered by Google's state-of-the-art LLM (\textit{gemini-1.5-pro}). We study the impact of various contextual information, including column example values, question and SQL query pairs, user-provided hints, SQL documentation, and schema. To the best of our knowledge, this is the first work to study how the extended context window and extra contextual information can help NL2SQL generation with respect to both accuracy and latency cost. We show that long context LLMs are robust and do not get lost in the extended contextual information. Additionally, our long-context NL2SQL pipeline based on Google's \textit{gemini-pro-1.5} achieve a strong performance with 67.41\% on BIRD benchmark (dev) without finetuning and expensive self-consistency based techniques. View details
    Toward Sensor-In-the-Loop LLM Agent: Benchmarks and Implications
    Zhiwei Ren
    Junbo Li
    Minjia Zhang
    Di Wang
    Longfei Shangguan
    SenSys 2025 - The 23rd ACM Conference on Embedded Networked Sensor Systems (2025)
    Preview abstract This paper advocates for sensor-informed personal agents that can take advantage of sensor hints on wearables to enhance the personal agent's response. We demonstrate that such a sensor-in-the-loop design paradigm can be easily integrated into existing LLM agents by building a prototype named WellMax based on existing well-developed techniques such as structured prompt tuning and few-shot prompting. The head-to-head comparison with a non-sensor-informed agent across five use scenarios demonstrates that this sensor-in-the-loop design can effectively improve users' needs and their overall experience. The deep-dive into agents' replies and participants' feedback further reveals that sensor-in-the-loop agents not only provide more contextually relevant responses but also exhibit a greater understanding of user priorities and situational nuances. We further conduct two case studies to examine the potential pitfalls and distill key insights from this sensor-in-the-loop agent. We believe this work sets the stage for more intelligent, empathetic, and effective interactions in future AI-driven personal assistants. View details
    Preview abstract Recent work suggested utilizing inference compute, showing that scaling of number of samples consistently improves the fractions of problems solved by any attempt, namely the coverage. In this work, we suggest that inference scaling gains should be compared with proper baselines, as some datasets become degenerate when allowing a large number of attempts. We focus on two domains - mathematical reasoning and factual knowledge, showing that for the MATH and Entity Questions datasets, informed answer enumeration obtains similar or even better results than repeated model sampling, with a much lower sample budget. While we believe that inference scaling is a promising approach for unlocking the potential of language models, we recommend carefully selecting models and datasets when applying this method. Otherwise, the results of inference scaling should be interpreted with caution. View details
    Test-Time Visual In-Context Tuning
    Jiahao Xie
    Nathalie Rauschmayr
    Bernt Schiele
    2025
    Preview abstract Visual in-context learning (VICL), as a new paradigm in computer vision, allows the model to rapidly adapt to various tasks with only a handful of prompts and examples. While effective, the existing VICL paradigm exhibits poor generalizability under distribution shifts. In this work, we propose test-time visual in-context tuning (VICT), a method that can learn adaptive VICL models on the fly with a single test sample. Specifically, We flip the role between task prompts and the test sample and use a cycle consistency loss to reconstruct the original task prompt output. Our key insight is that a model should be aware of a new test distribution if it can successfully recover the original task prompts. Extensive experiments on seven representative vision tasks with 15 corruptions demonstrate that our VICT can improve the generalizability of VICL to unseen new domains View details
    Ransomware over Modern Web Browsers: A Novel Strain and A New Defense Mechanism
    Harun Oz
    Ahmet Aris
    Leonardo Babun
    Selcuk Uluagac
    Abbas Acar
    ACM Transactions on the Web (2025)
    Preview abstract Ransomware is an increasingly prevalent form of malware targeting end-users, governments, and businesses. As it has evolved, adversaries added new capabilities to their arsenal. Throughout the ransomware evolution, the adversaries propose a next-generation browser-based ransomware, RøB, that performs its malicious actions via emerging web technologies, File System Access API (FSA) and WebAssembly (Wasm). RøB uses this API through the victims’ browsers; hence, it does not require the victims to download and install malicious binaries. We performed extensive evaluations with 3 different OSs, 23 file formats, 29 distinct directories, 5 cloud providers, and 4 antivirus solutions. Our evaluations show that RøB can encrypt various types of files in the local and cloud-integrated directories, external storage devices, and network-shared folders of victims. Our experiments also reveal that popular cloud solutions, Box Individual and Apple iCloud can be severely affected by RøB. Moreover, we conducted tests with commercial antivirus software such as AVG, Avast, Kaspersky, Malware Bytes that perform sensitive directory and suspicious behavior monitoring against ransomware. We verified that RøB can evade these antivirus software and encrypt victim files. Moreover, existing ransomware detection solutions in the literature also cannot be a remedy against RøB due to its distinct features. Therefore, in this paper, we also propose broguard, a new detection system for RøB-like attacks. broguard monitors the web applications that use the FSA API via function hooking and uses a machine learning classifier to detect RøB-like attacks in real-time without any file loss. Performance evaluations of broguard on a comprehensive dataset show that broguard can detect RøB-like browser-based ransomware attacks with over 99% accuracy and minimal overhead. View details
    Preview abstract The global adoption of Large Language Models (LLMs) in healthcare shows promise for enhancing clinical workflows and improving patient outcomes. However, Automatic Speech Recognition (ASR) errors in critical medical entities remain a significant challenge. These errors can lead to severe consequences if undetected. This study investigates the prevalence and impact of ASR errors in medical transcription across Africa, Europe, and North America. By examining variations in accented English across three continents, we analyze the impact of regional speech patterns on ASR performance. Our research quantifies both the potential and limitations of LLMs in mitigating ASR inaccuracies within various medical settings, with particular attention to performance variations across regional accents and medical terminology. Our findings highlight significant disparities in ASR accuracy across regions and identify specific conditions under which LLM corrections prove most effective. View details
    On the Design of the Binaural Rendering Library for Eclipsa Audio Immersive Audio Container
    Tomasz Rudzki
    Gavin Kearney
    AES 158th Convention of the Audio Engineering Society (2025)
    Preview abstract Immersive Audio Media and Formats (IAMF), also known as Eclipsa Audio, is an open-source audio container developed to accommodate multichannel and scene-based audio formats. Headphone-based delivery of IAMF audio requires efficient binaural rendering. This paper introduces the Open Binaural Renderer (OBR), which is designed to render IAMF audio. It discusses the core rendering algorithm, the binaural filter design process as well as real-time implementation of the renderer in a form of an open-source C++ rendering library. Designed for multi-platform compatibility, the renderer incorporates a novel approach to binaural audio processing, leveraging a combination of spherical harmonic (SH) based virtual listening room model and anechoic binaural filters. Through its design, the IAMF binaural renderer provides a robust solution for delivering high-quality immersive audio across diverse platforms and applications. View details
    Beyond Digital Literacy: Building Youth Digital Resilience Through Existing “Information Sensibility” Practices
    Mia Hassoun
    Ian Beacock
    Todd Carmody
    Patrick Gage Kelley
    Beth Goldberg
    Devika Kumar
    Laura Murray
    Rebekah Park
    Behzad Sarmadi
    Social Sciences Journal, 14(4) (2025)
    Preview abstract Youth media consumption and disordered eating practices have historically been subjects of moral panics, often resulting in protective, deficit-based interventions like content removal. We argue for interventions which instead equip youth to evaluate and manage risks in their online environments, building upon their existing “information sensibility” practices. Drawing upon ethnographic research and intervention testing with 77 participants in the US and India, we analyze how youth (aged 13–26), including those with diverse political perspectives and those recovering from disordered eating (DE), engage with online news and health information. Participants generally algorithmically encountered (rather than searched for) information online, and their engagement was shaped more by social motivations—like belonging—than truth seeking. Participants interpreted online information collaboratively, relying on social cues and peer validation within their online communities. They demonstrated preference for personal testimonies and relatable sources, particularly those with similar social identities. We propose resilience-building interventions that build upon these youth online information practices by: (1) leveraging peer networks, promoting critical information engagement through collaborative learning and peer-to-peer support within online communities; (2) developing social media sensibility, equipping youth to critically evaluate information sources in situ; (3) providing pathways offline, connecting youth to desired in-person communities; and (4) encouraging probabilistic thinking. View details
    Preview abstract We discuss the challenges posed by growing machine learning workloads on datacenter networks and present how Google’s Jupiter network fabrics effectively support diverse traffic. View details
    RemapRoute: Local Remapping of Internet Path Changes
    renata cruz teixeira
    italo cunha
    Elverton Fazzion
    Darryl Veitch
    2025
    Preview abstract Several systems rely on traceroute to track a large number of Internet paths as they change over time. Monitoring systems perform this task by remapping paths periodically or whenever a change is detected. This paper shows that such complete remapping is inefficient, because most path changes are localized to a few hops of a path. We develop RemapRoute, a tool to remap a path locally given the previously known path and a change point. RemapRoute sends targeted probes to locate and remap the often few hops that have changed. Our evaluation with trace-driven simulations and in a real deployment shows that local remapping reduces the average number of probes issued during remapping by 63% and 79%, respectively, when compared with complete remapping. At the same time, our results show that local remapping has little impact on the accuracy of inferred paths. View details
    LLM-based Lossless Text Simplification and its Effect on User Comprehension and Cognitive Load
    Theo Guidroz
    Diego Ardila
    Jimmy Li
    Adam Mansour
    Paul Jhun
    Nina Gonzalez
    Xiang Ji
    Mike Sanchez
    Miguel Ángel Garrido
    Divyansh Choudhary
    Jay Hartford
    Georgina Xu
    Henry Serrano
    Yifan Wang
    Jeff Shaffer
    Eric (Yifan) Cao
    Sho Fujiwara
    Peggy Bui
    arXiv (2025)
    Preview abstract Information on the web, such as scientific publications and Wikipedia, often surpasses users' reading level. To help address this, we used a self-refinement approach to develop a LLM capability for minimally lossy text simplification. To validate our approach, we conducted a randomized study involving 4563 participants and 31 texts spanning 6 broad subject areas: PubMed (biomedical scientific articles), biology, law, finance, literature/philosophy, and aerospace/computer science. Participants were randomized to viewing original or simplified texts in a subject area, and answered multiple-choice questions (MCQs) that tested their comprehension of the text. The participants were also asked to provide qualitative feedback such as task difficulty. Our results indicate that participants who read the simplified text answered more MCQs correctly than their counterparts who read the original text (3.9% absolute increase, p<0.05). This gain was most striking with PubMed (14.6%), while more moderate gains were observed for finance (5.5%), aerospace/computer science (3.8%) domains, and legal (3.5%). Notably, the results were robust to whether participants could refer back to the text while answering MCQs. The absolute accuracy decreased by up to ~9% for both original and simplified setups where participants could not refer back to the text, but the ~4% overall improvement persisted. Finally, participants' self-reported perceived ease based on a simplified NASA Task Load Index was greater for those who read the simplified text (absolute change on a 5-point scale 0.33, p<0.05). This randomized study, involving an order of magnitude more participants than prior works, demonstrates the potential of LLMs to make complex information easier to understand. Our work aims to enable a broader audience to better learn and make use of expert knowledge available on the web, improving information accessibility. View details
    Quartic Quantum Speedups for Planted Inference Problems
    Alexander Schmidhuber
    Ryan O'Donnell
    Physical Review X, 15 (2025), pp. 021077
    Preview abstract We describe a quantum algorithm for the Planted Noisy kXOR problem (also known as sparse Learning Parity with Noise) that achieves a nearly quartic (4th power) speedup over the best known classical algorithm while also only using logarithmically many qubits. Our work generalizes and simplifies prior work of Hastings, by building on his quantum algorithm for the Tensor Principal Component Analysis (PCA) problem. We achieve our quantum speedup using a general framework based on the Kikuchi Method (recovering the quartic speedup for Tensor PCA), and we anticipate it will yield similar speedups for further planted inference problems. These speedups rely on the fact that planted inference problems naturally instantiate the Guided Sparse Hamiltonian problem. Since the Planted Noisy kXOR problem has been used as a component of certain cryptographic constructions, our work suggests that some of these are susceptible to super-quadratic quantum attacks. View details
    Passive Heart Rate Monitoring During Smartphone Use in Everyday Life
    Shun Liao
    Paolo Di Achille
    Jiang Wu
    Silviu Borac
    Jonathan Wang
    Eric Teasley
    Lawrence Cai
    Daniel McDuff
    Hao-Wei Su
    Brent Winslow
    Anupam Pathak
    Shwetak Patel
    Jim Taylor
    Jamie Rogers
    (2025)
    Preview abstract Resting heart rate (RHR) is an important biomarker of cardiovascular health and mortality, but tracking it longitudinally generally requires a wearable device, limiting its availability. We present PHRM, a deep learning system for passive heart rate (HR) and RHR measurements during ordinary smartphone use, using facial video-based photoplethysmography. Our system was developed using 225,773 videos from 495 participants and validated on 185,970 videos from 205 participants in laboratory and free-living conditions – the largest validation study of its kind. Compared to reference electrocardiogram, PHRM achieved a mean absolute percentage error (MAPE) <10% for HR measurements across three skin tone groups of light, medium and dark pigmentation; MAPE for each skin tone group was non-inferior versus the others. Daily RHR measured by PHRM had a mean absolute error <5 bpm compared to a wearable HR tracker, and was associated with known risk factors. These results highlight the potential of smartphones to enable passive and equitable heart health monitoring. View details
    Preview abstract Understanding fine-grained temporal dynamics is crucial in egocentric videos, where continuous streams capture frequent, close-up interactions with objects. In this work, we bring to light that current egocentric video question-answering datasets often include questions that can be answered using only few frames or commonsense reasoning, without being necessarily grounded in the actual video. Our analysis shows that state-of-the-art Multi-Modal Large Language Models (MLLMs) on these benchmarks achieve remarkably high performance using just text or a single frame as input. To address these limitations, we introduce EgoTempo, a dataset specifically designed to evaluate temporal understanding in the egocentric domain. EgoTempo emphasizes tasks that require integrating information across the entire video, ensuring that models would need to rely on temporal patterns rather than static cues or pre-existing knowledge. Extensive experiments on EgoTempo show that current MLLMs still fall short in temporal reasoning on egocentric videos, and thus we hope EgoTempo will catalyze new research in the field and inspire models that better capture the complexity of temporal dynamics. Dataset and code are available at https://github.com/google-research-datasets/egotempo.git. View details
    Automatic Synthesis of Specialized Hash Function
    Renato B Hoffmann
    Leonardo G Fae
    Fernando Magno Quintao Pereira
    Dalvan Grieber
    2025
    Preview abstract Hashing is a fundamental operation in various computer sci- ence applications. Despite the prevalence of specific key formats like social security numbers, MAC addresses, plate numbers, and URLs, hashing libraries typically treat them as general byte sequences. This paper introduces a technique for synthesizing specialized hash functions tailored to par- ticular byte formats. The proposed code generation method leverages three prevalent patterns: (i) fixed-length keys, (ii) keys with common subsequences, and (iii) keys ranging on predetermined sequences of bytes. The code generation pro- cess involves two algorithms: one identifies relevant regular expressions within key examples, and the other generates specialized hash functions based on these expressions. This approach, straightforward to implement, showcases improve- ments over highly optimized hash function implementations. Comparative analysis demonstrates that our synthetic func- tions outperform counterparts in the C++ Standard Template Library and the Google Abseil Library, achieving speedups ranging from 2% to 11%, depending on the key format. View details