Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10458 publications
YETI (YET to Intervene) Proactive Interventions by Multimodal AI Agents in Augmented Reality Tasks
Saptarashmi Bandyopadhyay
Vikas Bahirwani
Lavisha Aggarwal
Bhanu Guda
Lin Li
Andrea Colaco
2025
Preview abstract
Multimodal AI Agents are AI models that have the capability of interactively and cooperatively assisting human users to solve day-to-day tasks. Augmented Reality (AR) head worn devices can uniquely improve the user experience of solving procedural day-to-day tasks by providing egocentric multimodal (audio and video) observational capabilities to AI Agents. Such AR capabilities can help the AI Agents see and listen to actions that users take which can relate to multimodal capabilities of human users. Existing AI Agents, either Large Language Models (LLMs) or Multimodal Vision-Language Models (VLMs) are reactive in nature, which means that models cannot take an action without reading or listening to the human user's prompts. Proactivity of AI Agents, on the other hand, can help the human user detect and correct any mistakes in agent observed tasks, encourage users when they do tasks correctly, or simply engage in conversation with the user - akin to a human teaching or assisting a user. Our proposed YET to Intervene (YETI) multimodal Agent focuses on the research question of identifying circumstances that may require the Agent to intervene proactively. This allows the Agent to understand when it can intervene in a conversation with human users that can help the user correct mistakes on tasks, like cooking, using Augmented Reality. Our YETI Agent learns scene understanding signals based on interpretable notions of Structural Similarity (SSIM) on consecutive video frames. We also define the alignment signal which the AI Agent can learn to identify if the video frames corresponding to the user's actions on the task are consistent with expected actions. These signals are used by our AI Agent to determine when it should proactively intervene. We compare our results on the instances of proactive intervention in the HoloAssist multimodal benchmark for an expert agent guiding an user agent to complete procedural tasks.
View details
How Unique is Whose Web Browser? The role of demographics in browser fingerprinting
Pritish Kamath
Robin Lassonde
2025
Preview abstract
Web browser fingerprinting can be used to identify and track users across the Web, even without cookies, by collecting attributes from users' devices to create unique "fingerprints". This technique and resulting privacy risks have been studied for over a decade. Yet further research is limited because prior studies did not openly publish their data. Additionally, data in prior studies had biases and lacked user demographics.
Here we publish a first-of-its-kind open dataset that includes browser attributes with users' demographics, collected from 8,400 US study participants, with their informed consent. Our data collection process also conducted an experiment to study what impacts users' likelihood to share browser data for open research, in order to inform future data collection efforts, with survey responses from a total of 12,461 participants. Female participants were significantly less likely to share their browser data, as were participants who were shown the browser data we asked to collect.
In addition we demonstrate how fingerprinting risks differ across demographic groups. For example, we find lower income users are more at risk, and find that as users' age increases, they are both more likely to be concerned about fingerprinting and at real risk of fingerprinting. Furthermore, we demonstrate an overlooked risk: user demographics, such as gender, age, income level, ethnicity and race, can be inferred from browser attributes commonly used for fingerprinting, and we identify which browser attributes most contribute to this risk.
Overall, we show the important role of user demographics in the ongoing work that intends to assess fingerprinting risks and improve user privacy, with findings to inform future privacy enhancing browser developments. The dataset and data collection tool we openly publish can be used to further study research questions not addressed in this work.
View details
Preview abstract
The problem of contract design addresses the challenge of moral hazard in principle-agent setups. The agent exerts costly efforts that produce a random outcome with an associated reward for the principal. Moral hazard refers to the tension that the principal cannot observe the agent’s effort level hence needs to incentivize the agent only through rewarding the realized effort outcome, i.e., the contract. Bayesian contract design studies the principal’s design problem of an optimal contract when facing an unknown agent characterized by a private Bayesian type. In its most general form, the agent’s type is inherently “multi-parameter” and can arbitrarily affect both the agent’s productivity and effort costs. In contrast, a natural single-parameter setting of much recent interest simplifies the agent’s type to a single value that describes the agent’s cost per unit of effort, whereas agents’ efforts are assumed to be equally
productive.
The main result of this paper is an almost approximation-preserving polynomial-time reduction from the most general multi-parameter Bayesian contract design (BCD) to single-parameter BCD. That is, for any multi-parameter BCD instance I^M, we construct a single-parameter instance I^S such that any β-approximate contract (resp. menu of contracts) of I^S can in turn be converted to a (β − ϵ)-approximate contract (resp. menu of contracts) of I^M. The reduction is in time polynomial in the input size and log(1/ϵ); moreover, when β = 1 (i.e., the given single-parameter solution is exactly optimal), the dependence on 1/ϵ can be removed, leading to a polynomial-time exact reduction. This efficient reduction is somewhat surprising because in the closely related problem of Bayesian mechanism design, a polynomial-time reduction from multi-parameter to single-parameter setting is believed to not exist. Our result demonstrates the intrinsic difficulty of addressing moral hazard in Bayesian contract design, regardless of being single-parameter or multi-parameter.
As byproducts, our reduction answers two open questions in recent literature of algorithmic contract design: (a) it implies that optimal contract design in single-parameter BCD is not in APX unless P=NP even when the agent’s type distribution is regular, answering the open question of [3] in the negative; (b) it implies that the principal’s (order-wise) tight utility gap between using a menu of contracts and a single contract is Θ(n) where n is the number of actions, answering the major open question of [27] for the single-parameter case.
View details
Mufu: Multilingual Fused Learning for Low- Resource Translation with LLM
Zheng Lim
Honglin Yu
Trevor Cohn
International Conference on Learning Representations (ICLR) 2025
Preview abstract
Multilingual large language models (LLMs) are great translators, but this is largely limited to high-resource languages. For many LLMs, translating in and out of low-resource languages remains a challenging task. To maximize data efficiency in this low-resource setting, we introduce Mufu, which includes a selection of automatically generated multilingual candidates and an instruction to correct inaccurate translations in the prompt. Mufu prompts turn a translation task into a postediting one, and seek to harness the LLM's reasoning capability with auxiliary translation candidates, from which the model is required to assess the input quality, align the semantics cross-lingually, copy from relevant inputs and override instances that are incorrect. Our experiments on En-XX translations over the Flores-200 dataset show LLMs finetuned against Mufu-style prompts are robust to poor quality auxiliary translation candidates, achieving performance superior to NLLB 1.3B distilled model in 64% of low- and very-low-resource language pairs. We then distill these models to reduce inference cost, while maintaining on average 3.1 chrF improvement over finetune-only baseline in low-resource translations.
View details
Benchmarking and improving algorithms for attributing satellite-observed contrails to flights
Vincent Rudolf Meijer
Rémi Chevallier
Allie Duncan
Kyle McConnaughay
Atmospheric Measurement Techniques, 18 (2025), pp. 3495-3532
Preview abstract
Condensation trail (contrail) cirrus clouds cause a substantial fraction of aviation's climate impact. One proposed method for the mitigation of this impact involves modifying flight paths to avoid particular regions of the atmosphere that are conducive to the formation of persistent contrails, which can transform into contrail cirrus. Determining the success of such avoidance maneuvers can be achieved by ascertaining which flight formed each nearby contrail observed in satellite imagery. The same process can be used to assess the skill of contrail forecast models. The problem of contrail-to-flight attribution is complicated by several factors, such as the time required for a contrail to become visible in satellite imagery, high air traffic densities, and errors in wind data. Recent work has introduced automated algorithms for solving the attribution problem, but it lacks an evaluation against ground-truth data. In this work, we present a method for producing synthetic contrail detections with predetermined contrail-to-flight attributions that can be used to evaluate – or “benchmark” – and improve such attribution algorithms. The resulting performance metrics can be employed to understand the implications of using these observational data in downstream tasks, such as forecast model evaluation and the analysis of contrail avoidance trials, although the metrics do not directly quantify real-world performance. We also introduce a novel, highly scalable contrail-to-flight attribution algorithm that leverages the characteristic compounding of error induced by simulating contrail advection using numerical weather models. The benchmark shows an improvement of approximately 25 % in precision versus previous contrail-to-flight attribution algorithms, without compromising recall.
View details
Preview abstract
Styled Handwritten Text Generation (HTG) has recently received attention from the computer vision and document analysis communities, which have developed several solutions, either GAN- or diffusion-based, that achieved promising results. Nonetheless, these strategies fail to generalize to novel styles and have technical constraints, particularly in terms of maximum output length and training efficiency. To overcome these limitations, in this work, we propose a novel framework for text image generation, dubbed Emuru. Our approach leverages a powerful text image representation model (a variational autoencoder) combined with an autoregressive Transformer. Our approach enables the generation of styled text images conditioned on textual content and style examples, such as specific fonts or handwriting styles. We train our model solely on a diverse, synthetic dataset of English text rendered in over 100,000 typewritten and calligraphy fonts, which gives it the capability to reproduce unseen styles (both fonts and users' handwriting) in zero-shot. To the best of our knowledge, Emuru is the first autoregressive model for HTG, and the first designed specifically for generalization to novel styles. Moreover, our model generates images without background artifacts, which are easier to use for downstream applications. Extensive evaluation on both typewritten and handwritten, any-length text image generation scenarios demonstrates the effectiveness of our approach.
View details
Matryoshka Model Learning for Improved Elastic Student Models
Chetan Verma
Aditya Srinivas Timmaraju
Cho-Jui Hsieh
Ngot Bui
Yang Zhang
Wen Chen
Xin Liu
Inderjit Dhillon
2025
Preview abstract
Industry-grade ML models are carefully designed to meet rapidly evolving serving constraints, which requires significant resources for model development. In this paper, we propose MatTA, a framework for training multiple accurate Student models using a novel Teacher-TA-Student recipe. TA models are larger versions of the Student models with higher capacity, and thus allow Student models to better relate to the Teacher model and also bring in more domain-specific expertise. Furthermore, multiple accurate Student models can be extracted from the TA model. Therefore, despite only one training run, our methodology provides multiple servable options to trade off accuracy for lower serving cost. We demonstrate the proposed method, MatTA, on proprietary datasets and models. Its practical efficacy is underscored by live A/B tests within a production ML system, demonstrating 20% improvement on a key metric. We also demonstrate our method on GPT-2 Medium, a public model, and achieve relative improvements of over 24% on SAT Math and over 10% on the LAMBADA benchmark.
View details
Fast electronic structure quantum simulation by spectrum amplification
Guang Hao Low
Robbie King
Dominic Berry
Qiushi Han
Albert Eugene DePrince III
Alec White
Rolando Somma
arXiv:2502.15882 (2025)
Preview abstract
The most advanced techniques using fault-tolerant quantum computers to estimate the ground-state energy of a chemical Hamiltonian involve compression of the Coulomb operator through tensor factorizations, enabling efficient block-encodings of the Hamiltonian. A natural challenge of these methods is the degree to which block-encoding costs can be reduced. We address this challenge through the technique of spectrum amplification, which magnifies the spectrum of the low-energy states of Hamiltonians that can be expressed as sums of squares. Spectrum amplification enables estimating ground-state energies with significantly improved cost scaling in the block encoding normalization factor $\Lambda$ to just $\sqrt{2\Lambda E_{\text{gap}}}$, where $E_{\text{gap}} \ll \Lambda$ is the lowest energy of the sum-of-squares Hamiltonian. To achieve this, we show that sum-of-squares representations of the electronic structure Hamiltonian are efficiently computable by a family of classical simulation techniques that approximate the ground-state energy from below. In order to further optimize, we also develop a novel factorization that provides a trade-off between the two leading Coulomb integral factorization schemes-- namely, double factorization and tensor hypercontraction-- that when combined with spectrum amplification yields a factor of 4 to 195 speedup over the state of the art in ground-state energy estimation for models of Iron-Sulfur complexes and a CO$_{2}$-fixation catalyst.
View details
TOKENFORMER: Rethinking Transformers Scaling with Tokenized Model Parameters
Haiyang Wang
Fan Yue
Jan Eric Lenssen
Liwei Wang
Bernt Schiele
2025
Preview abstract
Transformers have become the predominant architecture in foundation models due to their excellent performance across various domains. However, the substantial cost of scaling these models remains a significant concern. This problem arises primarily from their dependence on fixed parameters within linear projections, especially when architectural modifications (e.g., channel dimensions) are introduced. Each scaling iteration typically requires retraining the entire model from the beginning, leading to suboptimal utilization of computational resources. To overcome this limitation, we introduce TokenFormer, a naturally scalable architecture that leverages the attention mechanism exclusively for computations among input tokens and interactions between input tokens and model parameters, thereby enhancing architectural flexibility. By treating model parameters as tokens, we replace all the linear projections in Transformer with our token-parameter attention layer, where input tokens act as queries and model parameters as keys and values. This innovative approach allows for progressive and efficient scaling without necessitating retraining from scratch. Our model scales from 124 million to 1.4 billion parameters by incrementally adding new key-value parameters, achieving performance comparable to models trained from scratch while greatly reducing training costs. Code and models will be publicly available.
View details
Opportunities and Applications of GenAI in Smart Cities: A User-Centric Survey
Shashank Kapoor
Aman Raj
IEEE COINS (2025)
Preview abstract
The proliferation of IoT in cities, combined with Digital Twins, creates a rich data foundation for Smart Cities aimed at improving urban life and operations. Generative AI (GenAI) significantly enhances this potential, moving beyond traditional AI analytics by processing multimodal content and generating novel outputs like text and simulations. Using specialized or foundational models, GenAI's natural language abilities such as Natural Language Understanding (NLU) and Generation (NLG) can power tailored applications and unified interfaces, dramatically lowering barriers for users interacting with complex smart city systems. In this paper, we focus on GenAI applications based on conversational interfaces within the context of three critical user archetypes in a Smart City - Citizens, Operators and Planners. We identify and review GenAI models and techniques that have been proposed or deployed for various urban subsystems in the contexts of these user archetypes. We also consider how GenAI can be built on the existing data foundation of official city records, IoT data streams and Urban Digital Twins. We believe this work represents the first comprehensive summarization of GenAI techniques for Smart Cities from the lens of the critical users in a Smart City.
View details
Society-Centric Product Innovation in the Era of Customer Obsession
International Journal of Science and Research Archive (IJSRA), Volume 14 - Issue 1 (2025)
Preview abstract
This article provides a comprehensive analysis of the evolving landscape of innovation in the technology sector, with a focus on the intersection of technological progress and social responsibility. The article explores key challenges facing the industry, including public trust erosion, digital privacy concerns, and the impact of automation on workforce dynamics. It investigates responsible innovation frameworks' emergence and implementation across various organizations, highlighting the transformation from traditional development approaches to more society-centric models. The article demonstrates how companies balance innovation speed with social responsibility, incorporate ethical considerations into their development processes, and address digital disparities across different demographics. By examining how companies balance the pace of innovation with ethical responsibilities, integrate social considerations into their processes, and address digital inequities across diverse demographics, the article underscores the transformative potential of these frameworks. Through insights into cross-functional teams, impact assessment tools, and stakeholder engagement strategies, it demonstrates how responsible innovation drives both sustainable business value and societal progress.
View details
Deletion Robust Non-Monotone Submodular Maximization over Matroids
Paul Duetting
Federico Fusco
Ashkan Norouzi Fard
Journal of Machine Learning Research, 26 (2025), pp. 1-28
Preview abstract
Maximizing a submodular function is a fundamental task in machine learning and in this paper we study the deletion robust version of the problem under the classic matroids constraint. Here the goal is to extract a small size summary of the dataset that contains a high value independent set even after an adversary deleted some elements. We present constant-factor approximation algorithms, whose space complexity depends on the rank $k$ of the matroid and the number $d$ of deleted elements. In the centralized setting we present a $(4.597+O(\eps))$-approximation algorithm with summary size $O( \frac{k+d}{\eps^2}\log \frac{k}{\eps})$ that is improved to a $(3.582+O(\eps))$-approximation with $O(k + \frac{d}{\eps^2}\log \frac{k}{\eps})$ summary size when the objective is monotone. In the streaming setting we provide a $(9.435 + O(\eps))$-approximation algorithm with summary size and memory $O(k + \frac{d}{\eps^2}\log \frac{k}{\eps})$; the approximation factor is then improved to $(5.582+O(\eps))$ in the monotone case.
View details
CountQA: How Well Do MLLMs Count in the Wild?
Jayant Tamarapalli
Rynaa Grover
Nilay Pande
Sahiti Yerramilli
(2025)
Preview abstract
While Multimodal Large Language Models (MLLMs) display a remarkable fluency in describing visual scenes, their ability to perform the fundamental task of object counting remains poorly understood. This paper confronts this issue by introducing CountQA, a challenging new benchmark composed of over 1,500 question-answer pairs centered on images of everyday, real-world objects, often in cluttered and occluded arrangements. Our evaluation of 15 prominent MLLMs on CountQA systematically investigates this weakness, revealing a critical failure of numerical grounding: the models consistently struggle to translate raw visual information into an accurate quantity. By providing a dedicated tool to probe this foundational weakness, CountQA paves the way for the development of more robust and truly capable MLLMs that are spatially aware and numerically grounded.
View details
Triaging mammography with artificial intelligence: an implementation study
Sarah M. Friedewald
Sunny Jansen
Fereshteh Mahvar
Timo Kohlberger
David V. Schacht
Sonya Bhole
Dipti Gupta
Scott Mayer McKinney
Stacey Caron
David Melnick
Mozziyar Etemadi
Samantha Winter
Alejandra Maciel
Luca Speroni
Martha Sevenich
Arnav Agharwal
Rubin Zhang
Gavin Duggan
Shiro Kadowaki
Atilla Kiraly
Jie Yang
Basil Mustafa
Krish Eswaran
Shravya Shetty
Breast Cancer Research and Treatment (2025)
Preview abstract
Purpose
Many breast centers are unable to provide immediate results at the time of screening mammography which results in delayed patient care. Implementing artificial intelligence (AI) could identify patients who may have breast cancer and accelerate the time to diagnostic imaging and biopsy diagnosis.
Methods
In this prospective randomized, unblinded, controlled implementation study we enrolled 1000 screening participants between March 2021 and May 2022. The experimental group used an AI system to prioritize a subset of cases for same-visit radiologist evaluation, and same-visit diagnostic workup if necessary. The control group followed the standard of care. The primary operational endpoints were time to additional imaging (TA) and time to biopsy diagnosis (TB).
Results
The final cohort included 463 experimental and 392 control participants. The one-sided Mann-Whitney U test was employed for analysis of TA and TB. In the control group, the TA was 25.6 days [95% CI 22.0–29.9] and TB was 55.9 days [95% CI 45.5–69.6]. In comparison, the experimental group's mean TA was reduced by 25% (6.4 fewer days [one-sided 95% CI > 0.3], p<0.001) and mean TB was reduced by 30% (16.8 fewer days; 95% CI > 5.1], p=0.003). The time reduction was more pronounced for AI-prioritized participants in the experimental group. All participants eventually diagnosed with breast cancer were prioritized by the AI.
Conclusions
Implementing AI prioritization can accelerate care timelines for patients requiring additional workup, while maintaining the efficiency of delayed interpretation for most participants. Reducing diagnostic delays could contribute to improved patient adherence, decreased anxiety and addressing disparities in access to timely care.
View details
Preview abstract
Online scams are a growing threat in India, impacting millions and causing substantial financial losses year over year. This white paper presents ShieldUp!, a novel mobile game prototype designed to inoculate users against common online scams by leveraging the principles of psychological inoculation theory. ShieldUp! exposes users to weakened versions of manipulation tactics frequently used by scammers, and teaches them to recognize and pre-emptively refute these techniques. A randomized controlled trial (RCT) with 3,000 participants in India was conducted to evaluate the game's efficacy in helping users better identify scams scenarios. Participants were assigned to one of three groups: the ShieldUp! group (play time: 15 min), a general scam awareness group (watching videos and reading tips for 10-15 min), and a control group (plays "Chrome Dino", an unrelated game, for 10 minutes). Scam discernment ability was measured using a newly developed Scam Discernment Ability Test (SDAT-10) before the intervention, immediately after, and at a 21-day follow-up. Results indicated that participants who played ShieldUp! showed a significant improvement in their ability to identify scams compared to both control groups, and this improvement was maintained at follow-up. Importantly, while both interventions initially led users to to show increased skepticism towards even genuine online offers (NOT Scam scenarios), this effect dissipated after 21 days, suggesting no long-term negative impact on user trust. This study demonstrates the potential of game-based inoculation as a scalable and effective scam prevention strategy, offering valuable insights for product design, policy interventions, and future research, including the need for longitudinal studies and cross-cultural adaptations.
View details