Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10475 publications
Preview abstract
Large Language Models (LLMs) have demonstrated impressive capabilities across a range of natural language processing tasks. In particular, improvements in reasoning abilities and the expansion of context windows have opened new avenues for leveraging these powerful models.
NL2SQL is challenging in that the natural language question is inherently ambiguous, while the SQL generation requires a precise understanding of complex data schema and semantics. One approach to this semantic ambiguous problem is to provide more and sufficient contextual information.
In this work, we explore the performance and the latency trade-offs of the extended context window (a.k.a., long context) offered by Google's state-of-the-art LLM (\textit{gemini-1.5-pro}).
We study the impact of various contextual information, including column example values, question and SQL query pairs, user-provided hints, SQL documentation, and schema. To the best of our knowledge, this is the first work to study how the extended context window and extra contextual information can help NL2SQL generation with respect to both accuracy and latency cost.
We show that long context LLMs are robust and do not get lost in the extended contextual information. Additionally, our long-context NL2SQL pipeline based on Google's \textit{gemini-pro-1.5} achieve a strong performance with 67.41\% on BIRD benchmark (dev) without finetuning and expensive self-consistency based techniques.
View details
Mufu: Multilingual Fused Learning for Low- Resource Translation with LLM
Zheng Lim
Honglin Yu
Trevor Cohn
International Conference on Learning Representations (ICLR) 2025
Preview abstract
Multilingual large language models (LLMs) are great translators, but this is largely limited to high-resource languages. For many LLMs, translating in and out of low-resource languages remains a challenging task. To maximize data efficiency in this low-resource setting, we introduce Mufu, which includes a selection of automatically generated multilingual candidates and an instruction to correct inaccurate translations in the prompt. Mufu prompts turn a translation task into a postediting one, and seek to harness the LLM's reasoning capability with auxiliary translation candidates, from which the model is required to assess the input quality, align the semantics cross-lingually, copy from relevant inputs and override instances that are incorrect. Our experiments on En-XX translations over the Flores-200 dataset show LLMs finetuned against Mufu-style prompts are robust to poor quality auxiliary translation candidates, achieving performance superior to NLLB 1.3B distilled model in 64% of low- and very-low-resource language pairs. We then distill these models to reduce inference cost, while maintaining on average 3.1 chrF improvement over finetune-only baseline in low-resource translations.
View details
An Empirical Study of Time of Day Breakpoints in Traffic Light Plans
Eliav Buchnik
Tom Kalvari
Jack Haddad
Dan Karliner
Danny Veikherman
Shai Ferster
Ori Rottenstreich
2025
Preview abstract
Fixed time strategy is a common approach in signal traffic control in which signal plans are simple and periodic, enjoying easy implementation without detection mechanisms. A traffic light is associated with several daily plans, each applied to several consecutive hours. Time-of-day breakpoints (TODs) refer to the times over the day in which the plan is changed. TODs are often selected based on traffic, aiming to divide the day into groups of consecutive hours with similar traffic characteristics within each group of hours. We present a methodology to study time-of-day breakpoints in practice. We use this methodology to estimate and analyze time-of-day breakpoints in the city of Rio de Janeiro, Brazil based on traffic properties derived from traffic trajectories. Our study examines over 900 of the city intersections. We refer to properties such as the number of daily plans and the times by which plans start. We also provide traffic-aware insights on the potential improvement in the selection of TODs and identify key intersections where adjusting TODs could reduce average delay times. We identify potential improvements in over 8% of the examined intersections. These findings provide valuable insights for traffic engineers seeking to optimize signal timing.
View details
User-Centered Delivery of AI-Powered Health Care Technologies in Clinical Settings: Mixed Methods Case Study
Randall Brandt
Hien Brown
Christine Silva
JMIR Human Factors (2025)
Preview abstract
Background:
Providers spend a large percentage of their day using electronic health record (EHR) technology and frequently report frustration when EHR tasks are time-consuming and effortful. To solve these challenges, artificial intelligence (AI)–based enhancements to EHR technology are increasingly being deployed. However, AI-based implementations for EHR features often lack user-centered evaluation.
Objective:
This study evaluates, using a user-centered approach, the implementation of an AI-powered search and clinical discovery tool within an EHR system.
Methods:
We conducted a mixed methods study consisting of interviews, observations, and surveys for 5 months.
Results:
High adoption rates for the AI-based features (163/176, 93% users after 3 months) and significant increases across key metrics, including user satisfaction (U=49; P<.001) and perception of time saved (U=49; P<.001), demonstrated that the AI-based features were not only successfully integrated into various clinical workflows but also improved the user experience for clinicians.
Conclusions:
Our results underscore the feasibility and effectiveness of using a user-centered approach for the deployment of clinical AI tools. High adoption rates and positive user experiences were driven by our user-centered research program, which emphasized close collaboration with users, rapid incorporation of feedback, and tailored user training. This study program can be used as a starting framework for the design and integration of human-centered research methods for AI tool deployment in clinical settings.
View details
Toward Community- Led Evaluations of Text-to-Image AI Representations of Disability, Health, and Accessibility
Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO) (2025)
Preview abstract
Responsible AI advocates for user evaluations, particularly when concerning people with disabilities, health conditions, and accessibility needs ( DHA)–wide- ranging but umbrellaed sociodemograph- ics. However, community- centered text- to- image AI’s ( T2I) evaluations are often researcher- led, situating evaluators as consumers. We instead recruited 21 people with diverse DHA to evaluate T2I by writing and editing their own T2I prompts with their preferred language and topics, in a method mirroring everyday use. We contribute user- generated terminology categories which inform future research and data collections, necessary for developing authentic scaled evaluations. We additionally surface yet- discussed DHA AI harms intersecting race and class, and participants shared harm impacts they experienced as image- creator evaluators. To this end, we demonstrate that prompt engineering– proposed as a misrepresentation mitigation– was largely ineffective at improving DHA representations. We discuss the importance of evaluator agency to increase ecological validity in community- centered evaluations, and opportunities to research iterative prompting as an evaluation technique.
View details
LLM-based Lossless Text Simplification and its Effect on User Comprehension and Cognitive Load
Theo Guidroz
Diego Ardila
Jimmy Li
Adam Mansour
Paul Jhun
Nina Gonzalez
Xiang Ji
Mike Sanchez
Miguel Ángel Garrido
Faruk Ahmed
Divyansh Choudhary
Jay Hartford
Georgina Xu
Henry Serrano
Yifan Wang
Jeff Shaffer
Eric (Yifan) Cao
Sho Fujiwara
Peggy Bui
arXiv (2025)
Preview abstract
Information on the web, such as scientific publications and Wikipedia, often surpasses users' reading level. To help address this, we used a self-refinement approach to develop a LLM capability for minimally lossy text simplification. To validate our approach, we conducted a randomized study involving 4563 participants and 31 texts spanning 6 broad subject areas: PubMed (biomedical scientific articles), biology, law, finance, literature/philosophy, and aerospace/computer science. Participants were randomized to viewing original or simplified texts in a subject area, and answered multiple-choice questions (MCQs) that tested their comprehension of the text. The participants were also asked to provide qualitative feedback such as task difficulty. Our results indicate that participants who read the simplified text answered more MCQs correctly than their counterparts who read the original text (3.9% absolute increase, p<0.05). This gain was most striking with PubMed (14.6%), while more moderate gains were observed for finance (5.5%), aerospace/computer science (3.8%) domains, and legal (3.5%). Notably, the results were robust to whether participants could refer back to the text while answering MCQs. The absolute accuracy decreased by up to ~9% for both original and simplified setups where participants could not refer back to the text, but the ~4% overall improvement persisted. Finally, participants' self-reported perceived ease based on a simplified NASA Task Load Index was greater for those who read the simplified text (absolute change on a 5-point scale 0.33, p<0.05). This randomized study, involving an order of magnitude more participants than prior works, demonstrates the potential of LLMs to make complex information easier to understand. Our work aims to enable a broader audience to better learn and make use of expert knowledge available on the web, improving information accessibility.
View details
A Strategic Framework for AI Product Development and Evaluation in Enterprise Software
International Journal of Computer Engineering and Technology (IJCET), Volume 16, Issue 1 (2025)
Preview abstract
This article presents a comprehensive framework for developing and evaluating AI products in enterprise software systems, addressing the critical challenges organizations face during AI transformation initiatives. The article introduces a structured approach to decision-making for AI integration, encompassing ROI evaluation, user value assessment, and business impact analysis. It establishes distinct methodologies for both assistive and autonomous AI systems, providing detailed metrics for measuring success and performance across different implementation scenarios. Across various industries, the framework has shown potential in reducing implementation time, increasing user adoption rates, and enhancing overall project success rates, highlighting its practical applicability. The article methodology combines theoretical analysis with practical case studies, resulting in a flexible yet robust framework that can adapt to various organizational contexts. The framework's primary contribution lies in its practical approach to bridging the gap between theoretical AI capabilities and real-world implementation challenges, offering product leaders a systematic methodology for AI product development and evaluation. By addressing both current implementation challenges and future scalability requirements, this framework provides organizations with a foundational tool for navigating their AI transformation journey while maintaining a focus on measurable business outcomes and user value creation.
View details
Preview abstract
This tutorial examines the progress and scaling limitations of IM-DD based optical technologies and explores how datacenter use cases optimized coherent technology, including a newly proposed polarization-folding, time-diversity approach and a novel single-sideband coherent detection technology—can address some of these challenges
View details
Silent Data Corruption by 10× Test Escapes Threatens Reliable Computing
Rama Govindaraju
Eric Liu
Subhasish Mitra
Mike Fuller
IEEE (2025) (to appear)
Preview abstract
Summary:
Silent Data Corruption by 10x Test Escapes Threatens Reliable Computing" highlights a critical issue: manufacturing defects, dubbed "test escapes," are evading current testing methods at an alarming rate, ten times higher than industry targets. These defects lead to Silent Data Corruption (SDC), where applications produce incorrect outputs without error indications, costing companies significantly in debugging, data recovery, and service disruptions. The paper proposes a three-pronged approach: quick diagnosis of defective chips directly from system-level behaviors, in-field detection using advanced testing and error detection techniques like CASP, and new, rigorous test experiments to validate these solutions and improve manufacturing testing practices.
View details
MaRVL-QA: A Benchmark for Mathematical Reasoning over Visual Landscapes
Nilay Pande
Sahiti Yerramilli
Jayant Tamarapalli
Rynaa Grover
(2025)
Preview abstract
A key frontier for Multimodal Large Language Models (MLLMs) is the ability to perform deep mathematical and spatial reasoning directly from images, moving beyond their established success in semantic description. Mathematical surface plots provide a rigorous testbed for this capability, as they isolate the task of reasoning from the semantic noise common in natural images. To measure progress on this frontier, we introduce MaRVL (Mathematical Reasoning over Visual Landscapes), a new benchmark designed to quantitatively evaluate these core reasoning skills. The benchmark comprises two novel tasks: Topological Counting, identifying and enumerating features like local maxima; and Transformation Recognition, recognizing applied geometric transformations. Generated from a curated library of functions with rigorous ambiguity filtering, our evaluation on MaRVL reveals that even state-of-the-art MLLMs struggle significantly, often resorting to superficial heuristics instead of robust spatial reasoning. MaRVL provides a challenging new tool for the research community to measure progress, expose model limitations, and guide the development of MLLMs with more profound reasoning abilities.
View details
Preview abstract
Measuring software development can help drive impactful change. However, it’s a complex task, and getting started can be daunting as it involves understanding what you should measure, and determining what you can measure. This article provides a guide to selecting a framework that aligns with organizational measurement strategy.
View details
Data-Driven Mechanism Design: Jointly Eliciting Preferences and Information
Dirk Bergemann
Marek Bojko
Paul Duetting
Haifeng Xu
EC '25: Proceedings of the 26th ACM Conference on Economics and Computation (2025), pp. 507
Preview abstract
We study mechanism design when agents have private preferences and private information about a common payoff-relevant state. We show that standard message-driven mechanisms cannot implement socially efficient allocations when agents have multidimensional types, even under favorable conditions.
To overcome this limitation, we propose data-driven mechanisms that leverage additional post-allocation information, modeled as an estimator of the payoff-relevant state. Our data-driven mechanisms extend the classic Vickrey-Clarke-Groves class. We show that they achieve exact implementation in posterior equilibrium when the state is either fully revealed or the utility is affine in an unbiased estimator. We also show that they achieve approximate implementation with a consistent estimator, converging to exact implementation as the estimator converges, and present bounds on the convergence rate.
We demonstrate applications to digital advertising auctions and large language model (LLM)-based mechanisms, where user engagement naturally reveals relevant information.
View details
Beyond Touchscreens: Dynamic and Multimodal Interaction Needs
Melissa Barnhart Wantland
Mai Kobori
Universal Access in Human-Computer Interaction, Springer-Verlag (2025)
Preview abstract
Today’s smartphone interactions are typically designed with one primary preset, accompanied by customization settings that can be manually adjusted. To promote the creation of contextually aware experiences, researchers have highlighted the factors that influence mobile device usage in the ability-based design framework. This paper expands upon existing frameworks and contributes to an empirical understanding of smartphone accessibility. Through a 10-day longitudinal diary study and video interview with 24 individuals who do and do not identify as having a disability, the research also illustrates the reactions of reattempt, adaptation, and avoidance, which were used in response to a lack of smartphone accessibility. Despite experiencing scenarios where accessibility settings could be leveraged, 20 out of 24 participants did not use accessibility settings on their smartphone. A total of 12 out of 24 participants tried accessibility settings on their smartphones, however identifying accessibility was not for them. This work highlights the need to shift current design practices to better serve the accessibility community.
View details
Perch 2.0: The Bittern Lesson for Bioacoustics
Bart van Merriënboer
Tom Denton
arXiv (2025)
Preview abstract
Perch is a performant pre-trained model for bioacoustics. It was trained in supervised fashion, providing both off-the-shelf classification scores for thousands of vocalizing species as well as strong embeddings for transfer learning. In this new release, Perch 2.0, we expand from training exclusively on avian species to a large multi-taxa dataset. The model is trained with self-distillation using a prototype-learning classifier as well as a new source-prediction training criterion. Perch 2.0 obtains state-of-the-art performance on the BirdSet and BEANS benchmarks. It also outperforms specialized marine models on marine transfer learning tasks, despite having almost no marine training data. We present hypotheses as to why fine-grained species classification is a particularly robust pre-training task for bioacoustics.
View details
Preview abstract
This paper discusses the migration of data orchestration workflows from a legacy tool like Autosys to a modern, cloud - based solution, Google Cloud Composer. It explores the transition from traditional job scheduling to Directed Acyclic Graph (DAG) - based workflows using Apache Airflow, culminating in the deployment and management of these workflows in Cloud Composer. The benefits and challenges of this migration are examined, highlighting the advantages of scalability, flexibility, and cloud integration offered by Cloud Composer.
View details