Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10822 publications
Preview abstract
For many practical applications of quantum computing, the slowest and most costly steps involve coherently accessing classical data. We help address this challenge by applying mass production techniques, which can sometimes allow us to perform operations many times in parallel for a cost that is comparable to a single execution[1-3]. We combine existing mass-production results with modern approaches for loading classical data using ``quantum read-only memory.'' We show that quantum mass production techniques offer no benefit when we consider a cost model that focuses purely on the number of non-Clifford gates. However, analyzing the constant factors in a more nuanced cost model, we find that it may be possible to obtain a reduction in cost of an order or magnitude or more for a variety reasonably-sized fault-tolerant quantum algorithms. We present several applications of quantum mass-production techniques beyond naive parallelization, including a strategy for reducing the cost of serial calls to the same data loading step.
View details
Preview abstract
AI coding assistants are rapidly becoming integral to modern software development. A key challenge in this space is the continual need to migrate and modernize codebases in response to evolving software ecosystems. Traditionally, such migrations have relied on rule-based systems and human intervention. With the advent of powerful large language models (LLMs), AI-driven agentic frameworks offer a promising alternative—but their effectiveness remains underexplored. In this paper, we introduce FreshBrew, a novel benchmark for evaluating AI-based agentic frameworks on project-level Java migrations. We benchmark several such frameworks, powered by state-of-the-art LLMs, and compare their performance against established rule-based tools. Our evaluation of AI agents on this benchmark of 228 repositories shows that the top-performing model, Gemini 2.5 Flash, can successfully migrate 56.5% of projects to JDK 17. Our empirical analysis reveals novel insights into the critical strengths and limitations of current agentic approaches, offering actionable insights into their real-world applicability. By releasing FreshBrew publicly upon acceptance, we aim to facilitate rigorous, reproducible evaluation and catalyze progress in AI-driven codebase modernization.
View details
Preview abstract
This paper outlines a grammar of data analysis, as distinct from grammars of data manipulation. The primitives of this grammar are metrics and dimensions. We describe a Python implementation of this grammar called Meterstick, which is agnostic to the underlying data source, which may be a DataFrame or a SQL database.
View details
Amplifying Trans and Nonbinary Voices: A Community-Centred Harm Taxonomy for LLMs
Eddie Ungless
Beka Gulotta
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (2025)
Preview abstract
We explore large language model (LLM) responses that may negatively impact the transgender and nonbinary (TGNB) community and introduce the Transing Transformers Toolkit, T3, which provides resources for identifying such harmful response behaviors. The heart of T3 is a community-centred taxonomy of harms, developed in collaboration with the TGNB community, which we complement with, amongst other guidance, suggested heuristics for evaluation. To develop the taxonomy, we adopted a multi-method approach that included surveys and focus groups with community experts. The contribution highlights the importance of community-centred approaches in mitigating harm, and outlines pathways for LLM developers to improve how their models handle TGNB-related topics.
View details
Beyond Digital Literacy: Building Youth Digital Resilience Through Existing “Information Sensibility” Practices
Mia Hassoun
Ian Beacock
Todd Carmody
Patrick Gage Kelley
Beth Goldberg
Devika Kumar
Laura Murray
Rebekah Park
Behzad Sarmadi
Social Sciences Journal, 14(4) (2025)
Preview abstract
Youth media consumption and disordered eating practices have historically been subjects of moral panics, often resulting in protective, deficit-based interventions like content removal. We argue for interventions which instead equip youth to evaluate and manage risks in their online environments, building upon their existing “information sensibility” practices. Drawing upon ethnographic research and intervention testing with 77 participants in the US and India, we analyze how youth (aged 13–26), including those with diverse political perspectives and those recovering from disordered eating (DE), engage with online news and health information. Participants generally algorithmically encountered (rather than searched for) information online, and their engagement was shaped more by social motivations—like belonging—than truth seeking. Participants interpreted online information collaboratively, relying on social cues and peer validation within their online communities. They demonstrated preference for personal testimonies and relatable sources, particularly those with similar social identities. We propose resilience-building interventions that build upon these youth online information practices by: (1) leveraging peer networks, promoting critical information engagement through collaborative learning and peer-to-peer support within online communities; (2) developing social media sensibility, equipping youth to critically evaluate information sources in situ; (3) providing pathways offline, connecting youth to desired in-person communities; and (4) encouraging probabilistic thinking.
View details
Life at the Boundary of Chemical Kinetics and Program Execution
Thomas Fischbacher
Physical Review E (2025)
Preview abstract
Abstract
This work introduces a generic quantitative framework for studying
processes that involve interactions of polymer sequences. Possible
applications range from quantitative studies of the reaction kinetics
of polymerization processes to explorations of the behavior of
chemical implementations of computational - including basic life-like
- processes. This way, we establish a bridge between thermodynamic and
computational aspects of systems that are defined in terms of sequence
interactions. As a by-product of these investigations, we clarify some
common confusion around the notion of ``autocatalysis''.
Using a Markov process model of polymer sequence composition and
dynamical evolution of the Markov process's parameters via an ODE that
arises when taking the double ``chemical'' many-particle limit as well
as ``rarefied interactions'' limit, this approach enables - for example
- accurate quantitative explorations of entropy generation in systems
where computation is driven by relaxation to thermodynamic equilibrium.
The computational framework internally utilizes the Scheme programming
language's intrinsic continuation mechanisms to provide nondeterministic
evaluation primitives that allow the user to specify example systems in
straight purely functional code, making exploration of all possible
relevant sequence composition constellations - which would be otherwise
tedious to write code for - automatic and hidden from the user.
As the original motivation for this work came from investigations into
emergent program evolution that arises in computational substrates of
the form discussed in recent work on ``Computational Life''
\cite{alakuijala2024computational}, a major focus of attention is on
giving a deeper explanation of key requirements for the possible
emergence of self-replicators especially in settings whose behavior is
governed by real world physics rather than ad-hoc rules that may be
difficult to implement in a physical system. A collection of fully
worked out examples elucidate how this modeling approach is
quantitatively related to Metropolis Monte Carlo based simulations as
well as exact or approximate analytic approaches, and how it can be
utilized to study a broad range of different systems. These examples
can also serve as starting points for further explorations.
View details
Inspect or Guess? Mechanism Design with Unobservable Inspection
Azarakhsh Malekian
Ali Daei Naby
The 21st Conference on Web and Internet Economics (WINE) (2025) (to appear)
Preview abstract
We study the problem of selling $k$ units of an item to $n$ unit-demand buyers to maximize revenue, where buyers' values are independently (and not necessarily identically) distributed. The buyers' values are initially unknown but can be learned at a cost through inspection sources. Motivated by applications in e-commerce, where the inspection is unobservable by the seller (i.e., buyers can externally inspect their values without informing the seller), we introduce a framework to find the optimal selling strategy when the inspection is unobservable by the seller. We fully characterize the optimal mechanism for selling to a single buyer, subject to an upper bound on the allocation probability. Building on this characterization and leveraging connections to the \emph{Prophet Inequality}, we design an approximation mechanism for selling $k$ items to $n$ buyers that achieves $1-1/\sqrt{k+3}$ of the optimal revenue. Our mechanism is simple and sequential and achieves the same approximation bound in an online setting, remaining robust to the order of buyer arrivals. Additionally, in a setting with observable inspection, we leverage connections to index-based \emph{committing policies} in \emph{Weitzman's Pandora's problem with non-obligatory inspection} and propose a new sequential
mechanism for selling an item to $n$ buyers that significantly improves the existing approximation factor to the optimal revenue from $0.5$ to $0.8$.
View details
Society-Centric Product Innovation in the Era of Customer Obsession
International Journal of Science and Research Archive (IJSRA), Volume 14 - Issue 1 (2025)
Preview abstract
This article provides a comprehensive analysis of the evolving landscape of innovation in the technology sector, with a focus on the intersection of technological progress and social responsibility. The article explores key challenges facing the industry, including public trust erosion, digital privacy concerns, and the impact of automation on workforce dynamics. It investigates responsible innovation frameworks' emergence and implementation across various organizations, highlighting the transformation from traditional development approaches to more society-centric models. The article demonstrates how companies balance innovation speed with social responsibility, incorporate ethical considerations into their development processes, and address digital disparities across different demographics. By examining how companies balance the pace of innovation with ethical responsibilities, integrate social considerations into their processes, and address digital inequities across diverse demographics, the article underscores the transformative potential of these frameworks. Through insights into cross-functional teams, impact assessment tools, and stakeholder engagement strategies, it demonstrates how responsible innovation drives both sustainable business value and societal progress.
View details
Balancing AI and Human Insights in Scientific Discovery: Challenges and Guidelines
Javier García-Martínez
Pilar Manchon
Ricardo Vinuesa
Sergio Hoyas
The Innovation (2025)
Preview abstract
Recent advancements in large language models (LLMs) have enabled AI systems to autonomously assist in scientific research, from hypothesis generation to laboratory experimentation, transforming how research proposals are written and experiments are designed. Tools like AI "co-scientists" promise to enhance scientific productivity but raise concerns about diminishing human intuition, reinforcing incremental research, and concentrating power among a few entities. As LLMs become increasingly integrated into research processes, there is a risk of reduced creativity, ethical misconduct, and overreliance on AI-driven evaluation systems. To address these challenges, in this article we propose ethical guidelines focusing on transparency, accountability, fairness, and safeguarding transformative research. Ultimately, AI should be used to augment—not replace—human insight in scientific discovery.n
View details
Preview abstract
We discuss the challenges posed by growing machine learning workloads on
datacenter networks and present how Google’s Jupiter network fabrics effectively support
diverse traffic.
View details
Preview abstract
Styled Handwritten Text Generation (HTG) has recently received attention from the computer vision and document analysis communities, which have developed several solutions, either GAN- or diffusion-based, that achieved promising results. Nonetheless, these strategies fail to generalize to novel styles and have technical constraints, particularly in terms of maximum output length and training efficiency. To overcome these limitations, in this work, we propose a novel framework for text image generation, dubbed Emuru. Our approach leverages a powerful text image representation model (a variational autoencoder) combined with an autoregressive Transformer. Our approach enables the generation of styled text images conditioned on textual content and style examples, such as specific fonts or handwriting styles. We train our model solely on a diverse, synthetic dataset of English text rendered in over 100,000 typewritten and calligraphy fonts, which gives it the capability to reproduce unseen styles (both fonts and users' handwriting) in zero-shot. To the best of our knowledge, Emuru is the first autoregressive model for HTG, and the first designed specifically for generalization to novel styles. Moreover, our model generates images without background artifacts, which are easier to use for downstream applications. Extensive evaluation on both typewritten and handwritten, any-length text image generation scenarios demonstrates the effectiveness of our approach.
View details
Origin-destination travel demand estimation: an approach that scales worldwide, and its application to five metropolitan highway networks
Christopher Bian
Yechen Li
Willa Ng
Bin Yan
Janny Zhang
2025
Preview abstract
Estimating Origin-Destination (OD) travel demand is vital for effective urban planning
and traffic management. Developing universally applicable OD estimation
methodologies is significantly challenged by the pervasive scarcity of high-fidelity traffic
data and the difficulty in obtaining city-specific prior OD estimates (or seed ODs), which
are often prerequisite for traditional approaches. Our proposed method directly
estimates OD travel demand by systematically leveraging aggregated, anonymized
statistics from Google Maps Traffic Trends, obviating the need for conventional census
or city-provided OD data. The OD demand is estimated by formulating a single-level,
one-dimensional, continuous nonlinear optimization problem with nonlinear equality
and bound constraints to replicate highway path travel times. The method achieves
efficiency and scalability by employing a differentiable analytical macroscopic network
model. This model by design is computationally lightweight, distinguished by its
parsimonious parameterization that requires minimal calibration effort and its capacity
for instantaneous evaluation. These attributes ensure the method's broad applicability
and practical utility across diverse cities globally. Using segment sensor counts from
Los Angeles and San Diego highway networks, we validate our proposed approach,
demonstrating a two-thirds to three-quarters improvement in the fit to segment count
data over a baseline. Beyond validation, we establish the method's scalability and
robust performance in replicating path travel times across diverse highway networks,
including Seattle, Orlando, Denver, Philadelphia, and Boston. In these expanded
evaluations, our method not only aligns with simulation-based benchmarks but also
achieves an average 13% improvement in it's ability to fit travel time data compared to
the baseline during afternoon peak hours.
View details
Mix&Slice
Marco Rosa
Encyclopedia of Cryptography, Security and Privacy, Springer Nature Switzerland (2025), pp. 1550-1555
Preview abstract
Mix&Slice is an encryption technique that enables efficient and robust access revocation on resources stored at external cloud providers. The technique makes use of a transformation that provides strong inter-dependency in the encrypted representation of a resource. To perform access revocation, it is then sufficient to re-encrypt a small portion of the resource to have guarantees that the resource (and any of its parts) will become unintelligible to those from whom access has been revoked.
View details
RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation
Aviv Slobodkin
Hagai Taitelbaum
Brian Gordon
Michal Sokolik
Almog Gueta
Royi Rassin
Dani Lischinski
2025
Preview abstract
Subject-driven text-to-image (T2I) generation aims to produce images that align with a given textual description, while preserving the visual identity from a referenced subject image. Despite its broad downstream applicability - ranging from enhanced personalization in image generation to consistent character representation in video rendering - progress in this field is limited by the lack of reliable automatic evaluation. Existing methods either assess only one aspect of the task (i.e., textual alignment or subject preservation), misalign with human judgments, or rely on costly API-based evaluation. To address this gap, we introduce RefVNLI, a cost-effective metric that evaluates both textual alignment and subject preservation in a single run. Trained on a large-scale dataset derived from video-reasoning benchmarks and image perturbations, RefVNLI outperforms or statistically matches existing baselines across multiple benchmarks and subject categories (e.g., Animal, Object), achieving up to 6.4-point gains in textual alignment and 5.9-point gains in subject preservation.
View details
Ethical Co-Development of AI Applications with Indigenous Communities
Claudio Pinhanez
Edem Wornyo
2025
Preview abstract
This course explores how researchers and practitioners can engage ethically with Indigenous communities when
developing AI- and data-intensive applications. Some key issues such as fair engagement, legal constraints, reciprocity, and informed consent are discussed based on the examples drawn from the instructors’ experience. The course also examines good practices in terms of co-designing and co-development processes, data governance and sovereignty issues and systems, decolonial software licensing, and processes of technology transfer and appropriation. In its practical part, the course critically discusses examples and cases gathered from the audience to explore the diversity of issues and solutions when working with Indigenous communities.
View details