Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 10465 publications
    Scaling Laws for Downstream Task Performance in Machine Translation
    Natalia Ponomareva
    Hussein Hazimeh
    Sanmi Koyejo
    International Conference on Learning Representations (ICLR) (2025) (to appear)
    Preview abstract Scaling laws provide important insights that can guide the design of large language models (LLMs). Existing work has primarily focused on studying scaling laws for pretraining (upstream) loss. However, in transfer learning settings, in which LLMs are pretrained on an unsupervised dataset and then finetuned on a downstream task, we often also care about the downstream performance. In this work, we study the scaling behavior in a transfer learning setting, where LLMs are finetuned for machine translation tasks. Specifically, we investigate how the choice of the \emph{pretraining} data and its size affect downstream performance (translation quality) as judged by: downstream cross-entropy and translation quality metrics such as BLEU and COMET scores. Our experiments indicate that the size of the finetuning dataset and the distribution alignment between the pretraining and downstream data significantly influence the scaling behavior. With sufficient alignment, both downstream cross-entropy and translation quality scores improve monotonically with more pretraining data. In such cases, we show that it is possible to predict the downstream translation quality metrics with good accuracy using a log-law. However, there are cases where moderate misalignment causes the downstream translation scores to fluctuate or get worse with more pretraining, whereas downstream cross-entropy monotonically improves. By analyzing these, we provide new practical insights for choosing appropriate pretraining data. View details
    Automatic Synthesis of Specialized Hash Function
    Renato B Hoffmann
    Leonardo G Fae
    Fernando Magno Quintao Pereira
    Dalvan Grieber
    2025
    Preview abstract Hashing is a fundamental operation in various computer sci- ence applications. Despite the prevalence of specific key formats like social security numbers, MAC addresses, plate numbers, and URLs, hashing libraries typically treat them as general byte sequences. This paper introduces a technique for synthesizing specialized hash functions tailored to par- ticular byte formats. The proposed code generation method leverages three prevalent patterns: (i) fixed-length keys, (ii) keys with common subsequences, and (iii) keys ranging on predetermined sequences of bytes. The code generation pro- cess involves two algorithms: one identifies relevant regular expressions within key examples, and the other generates specialized hash functions based on these expressions. This approach, straightforward to implement, showcases improve- ments over highly optimized hash function implementations. Comparative analysis demonstrates that our synthetic func- tions outperform counterparts in the C++ Standard Template Library and the Google Abseil Library, achieving speedups ranging from 2% to 11%, depending on the key format. View details
    Preview abstract Generative Artificial Intelligence (AI), particularly Large Language Models (LLMs), have demonstrated significant potential in clinical reasoning skills such as history-taking and differential diagnosis generation—critical aspects of medical education. This work explores how LLMs can augment medical curricula through interactive learning. We conducted a participatory design process with medical students, residents and medical education experts to co-create an AI-powered tutor prototype for clinical reasoning. As part of the co-design process, we conducted a qualitative user study, investigating learning needs and practices via interviews, and conducting concept evaluations through interactions with the prototype. Findings highlight the challenges learners face in transitioning from theoretical knowledge to practical application, and how an AI tutor can provide personalized practice and feedback. We conclude with design considerations, emphasizing the importance of context-specific knowledge and emulating positive preceptor traits, to guide the development of AI tools for medical education. View details
    Confidence Improves Self-Consistency in LLMs
    Tom Sheffer
    Eran Ofek
    Ariel Goldstein
    Zorik Gekhman
    ACL 2025, Findings (2025)
    Preview abstract Self-consistency decoding enhances LLMs’ performance on reasoning tasks by sampling diverse reasoning paths and selecting the most frequent answer. However, it is computationally expensive, as sampling many of these (lengthy) paths is required to increase the chances that the correct answer emerges as the most frequent one. To address this, we introduce Confidence-Informed Self-Consistency (CISC). CISC performs a weighted majority vote based on confidence scores obtained directly from the model. By prioritizing high-confidence paths, it can identify the correct answer with a significantly smaller sample size. When tested on nine models and four datasets, CISC outperforms self-consistency in nearly all configurations, reducing the required number of reasoning paths by over 40% on average. In addition, we introduce the notion of within-question confidence evaluation, after showing that standard evaluation methods are poor predictors of success in distinguishing correct and incorrect answers to the same question. In fact, the most calibrated confidence method proved to be the least effective for CISC. Lastly, beyond these practical implications, our results and analyses show that LLMs can effectively judge the correctness of their own outputs, contributing to the ongoing debate on this topic. View details
    Perceptual Evaluation of a Mix Presentation for Immersive Audio with IAMF
    Carlos Tejeda-Ocampo
    Toni Hirvonen
    Ema Souza-Blanes
    Mahmoud Namazi
    AES 158th Convention of the Audio Engineering Society (2025)
    Preview abstract Immersive audio mix presentations involve transmitting and rendering several audio elements simultaneously. This enables next-generation applications, such as personalized playback. Using immersive loudspeaker and headphone MUSHRA tests, we investigate bitrate vs. quality for a typical mix presentation use case of a foreground stereo element, plus a background Ambisonics scene. For coding, we use Immersive Audio Model and Formats, a recently proposed system for Next-Generation Audio. Excellent quality is achieved at 384 kbit/s even with reasonable amount of personalization. We also propose a framework for content-aware analysis that can significantly reduce the bitrate when using underlying legacy audio coding instances. View details
    Beyond Touchscreens: Dynamic and Multimodal Interaction Needs
    Melissa Barnhart Wantland
    Mai Kobori
    Universal Access in Human-Computer Interaction, Springer-Verlag (2025)
    Preview abstract Today’s smartphone interactions are typically designed with one primary preset, accompanied by customization settings that can be manually adjusted. To promote the creation of contextually aware experiences, researchers have highlighted the factors that influence mobile device usage in the ability-based design framework. This paper expands upon existing frameworks and contributes to an empirical understanding of smartphone accessibility. Through a 10-day longitudinal diary study and video interview with 24 individuals who do and do not identify as having a disability, the research also illustrates the reactions of reattempt, adaptation, and avoidance, which were used in response to a lack of smartphone accessibility. Despite experiencing scenarios where accessibility settings could be leveraged, 20 out of 24 participants did not use accessibility settings on their smartphone. A total of 12 out of 24 participants tried accessibility settings on their smartphones, however identifying accessibility was not for them. This work highlights the need to shift current design practices to better serve the accessibility community. View details
    A Strategic Framework for AI Product Development and Evaluation in Enterprise Software
    International Journal of Computer Engineering and Technology (IJCET), Volume 16, Issue 1 (2025)
    Preview abstract This article presents a comprehensive framework for developing and evaluating AI products in enterprise software systems, addressing the critical challenges organizations face during AI transformation initiatives. The article introduces a structured approach to decision-making for AI integration, encompassing ROI evaluation, user value assessment, and business impact analysis. It establishes distinct methodologies for both assistive and autonomous AI systems, providing detailed metrics for measuring success and performance across different implementation scenarios. Across various industries, the framework has shown potential in reducing implementation time, increasing user adoption rates, and enhancing overall project success rates, highlighting its practical applicability. The article methodology combines theoretical analysis with practical case studies, resulting in a flexible yet robust framework that can adapt to various organizational contexts. The framework's primary contribution lies in its practical approach to bridging the gap between theoretical AI capabilities and real-world implementation challenges, offering product leaders a systematic methodology for AI product development and evaluation. By addressing both current implementation challenges and future scalability requirements, this framework provides organizations with a foundational tool for navigating their AI transformation journey while maintaining a focus on measurable business outcomes and user value creation. View details
    Preview abstract Specific quantum algorithms exist to—in theory— break elliptic curve cryptographic protocols. Implementing these algorithms requires designing quantum circuits that perform elliptic curve arithmetic. To accurately judge a cryptographic protocol’s resistance against future quantum computers, researchers figure out minimal resource-count circuits for performing these operations while still being correct. To assure the correctness of a circuit, it is integral to restore all ancilla qubits used to their original states. Failure to do so could result in decoherence of the computation’s final result. Through rigorous classical simulation and unit testing, I surfaced four inconsistencies in the state-ofthe-art quantum circuit for elliptic curve point addition where the circuit diagram states the qubits are returned in the original (|0⟩) state, but the intermediate values are not uncomputed. I provide fixes to the circuit without increasing the leading-order gate cost. View details
    Fast Tensor Completion via Approximate Richardson Iteration
    Mehrdad Ghadiri
    Yunbum Kook
    Ali Jadbabaie
    Proceedings of the 42nd International Conference on Machine Learning (2025)
    Preview abstract We study tensor completion (TC) through the lens of low-rank tensor decomposition (TD). Many TD algorithms use fast alternating minimization methods, which solve highly structured linear regression problems at each step (e.g., for CP, Tucker, and tensor-train decompositions). However, such algebraic structure is lost in TC regression problems, making direct extensions unclear. To address this, we propose a lifting approach that approximately solves TC regression problems using structured TD regression algorithms as blackbox subroutines, enabling sublinear-time methods. We theoretically analyze the convergence rate of our approximate Richardson iteration based algorithm, and we demonstrate on real-world tensors that its running time can be 100x faster than direct methods for CP completion. View details
    Quantum simulation with sum-of-squares spectral amplification
    Robbie King
    Guang Hao Low
    Rolando Somma
    arXiv:2505.01528 (2025)
    Preview abstract We introduce sum-of-squares spectral amplification (SOSSA), a framework for improving quantum simulation algorithms relevant to low-energy problems. SOSSA first represents the Hamiltonian as a sum-of-squares and then applies spectral amplification to amplify the low-energy spectrum. The sum-of-squares representation can be obtained using semidefinite programming. We show that SOSSA can improve the efficiency of traditional methods in several simulation tasks involving low-energy states. Specifically, we provide fast quantum algorithms for energy and phase estimation that improve over the state-of-the-art in both query and gate complexities, complementing recent results on fast time evolution of low-energy states. To further illustrate the power of SOSSA, we apply it to the Sachdev-Ye-Kitaev model, a representative strongly correlated system, where we demonstrate asymptotic speedups by a factor of the square root of the system size. Notably, SOSSA was recently used in [G.H. Low \textit{et al.}, arXiv:2502.15882 (2025)] to achieve state-of-art costs for phase estimation of real-world quantum chemistry systems. View details
    Linear Elastic Caching via Ski Rental
    Todd Lipcon
    The biennial Conference on Innovative Data Systems Research (2025)
    Preview abstract In this work we study the Linear Elastic Caching problem, where the goal is to minimize the total cost of a cache inclusive of not just its misses, but also its memory footprint integrated over time. We demonstrate a theoretical connection to the classic ski rental problem and propose a practical algorithm that combines online caching algorithms with ski rental policies. We also introduce a lightweight machine learning-based algorithm for ski rental that is optimized for production workloads and is easy to integrate within existing database systems. Evaluations on both production workloads in Google Spanner and publicly available traces show that the proposed elastic caching approach can significantly reduce the total cache cost compared to traditional fixed-size cache policies. View details
    Software development is a team sport
    Jie Chen
    Alison Chang
    Rayven Plaza
    Marie Huber
    Claire Taylor
    IEEE Software (2025)
    Preview abstract In this article, we describe our human-centered research focused on understanding the role of collaboration and teamwork in productive software development. We describe creation of a logs-based metric to identify collaboration through observable events and a survey-based multi-item scale to assess team functioning. View details
    Study of Arterials in the City of Rio de Janeiro for Traffic Coordination
    Ori Rottenstreich
    Eliav Buchnik
    Danny Veikherman
    Dan Karliner
    Tom Kalvari
    Shai Ferster
    Ron Tsibulsky
    Jack Haddad
    2025
    Preview abstract Urban traffic congestion is a growing challenge, and optimizing signal timing strategies is crucial for improving traffic flow and reducing emissions. The coordination of signalized intersections improves both traffic operations and environmental aspects. Coordination is particularly important along arterials, sequences of signalized intersections that serve as the primary routes and carry a high volume of traffic. In this paper we analyze real data from the city of Rio de Janeiro to study properties of arterials. We refer to their length, the distance between intersections and to the properties of the traffic light plans such as cycle time. We then study their in practice level of coordination in terms of number of stops and their common locations along the arterials. We dive into particular arterials and provide insights that can be useful for efficient design of arterials in additional cities. Based on the analysis, we show how simple traffic properties can indicate the potential upon coordinating two adjacent intersections as part of an arterial in improving traffic performance. View details
    Fine-grained Measurement of Vehicle Delay Fairness
    Eliav Buchnik
    Tom Kalvari
    Jack Haddad
    Dan Karliner
    Danny Veikherman
    Ron Tsibulsky
    Shai Ferster
    Ori Rottenstreich
    2025
    Preview abstract Optimizing signal timing in traffic lights helps to improve traffic flow and reduce emissions through reducing delays. At intersections, vehicles from different movements observe different delays impacted by the traffic light plan. This paper analyzes delay fairness among various vehicles at intersections. We refer to three cities: Rio de Janeiro, Hamburg and Seattle with a total number of over 5100 intersections. We present an intuitive methodology to compute delay fairness based on Gini index, a common fairness measure in economics. We evaluate the fairness based on real traffic data and provide insights on the relationship of fairness with day hours and traffic demand. We also examine real changes in traffic light plans that occurred in practice to check whether improving delay is often aligned with increasing fairness. View details
    Preview abstract Judging an action’s safety requires knowledge of the context in which the action takes place. To human agents who act in various contexts, this may seem obvious: performing an action such as email deletion may or may not be appropriate depending on the email’s content, the goal (e.g., to erase sensitive emails or to clean up trash), and the type of email address (e.g., work or personal). Unlike people, computational systems have often had only limited agency in limited contexts. Thus, manually crafted policies and user confirmation (e.g., smartphone app permissions or network access control lists), while imperfect, have sufficed to restrict harmful actions. However, with the upcoming deployment of generalist agents that support a multitude of tasks (e.g., an automated personal assistant), we argue that we must rethink security designs to adapt to the scale of contexts and capabilities of these systems. As a first step, this paper explores contextual security in the domain of agents and proposes contextual agent security (Conseca), a framework to generate just-in-time, contextual, and human-verifiable security policies. View details