Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 11091 publications
Who Controls the Curriculum for AI? The Limits of Participatory Design for Educational AI
Michael Madaio
Learning Under Algorithmic Conditions, University of Minnesota Press (2026)
Preview abstract
Participatory design is a long-standing effort to shift control over technology design from technologists to users and communities impacted by technologies. For educational AI, this means involving students, families, teachers, and other stakeholders in shaping the design of AI systems. While promising, in this article, I situate the recent calls for participatory design of educational AI systems within a different historical tradition—that of contests over local control of educational curricula. I argue that approaches that attempt to steer the design and development of educational AI through participatory methods may inadvertently reproduce the history of political contestation of educational curricula, in ways that may privilege the most powerful communities, rather than those inequitably impacted. What might it look like to treat participatory AI design as a site for political contestation? How might these approaches avoid reproducing the same majoritarian tendencies that led to educational inequities in the first place?
View details
Preview abstract
AI coding assistants are rapidly becoming integral to modern software development. A key challenge in this space is the continual need to migrate and modernize codebases in response to evolving software ecosystems. Traditionally, such migrations have relied on rule-based systems and human intervention. With the advent of powerful large language models (LLMs), AI-driven agentic frameworks offer a promising alternative—but their effectiveness remains underexplored. In this paper, we introduce FreshBrew, a novel benchmark for evaluating AI-based agentic frameworks on project-level Java migrations. We benchmark several such frameworks, powered by state-of-the-art LLMs, and compare their performance against established rule-based tools. Our evaluation of AI agents on this benchmark of 228 repositories shows that the top-performing model, Gemini 2.5 Flash, can successfully migrate 56.5% of projects to JDK 17. Our empirical analysis reveals novel insights into the critical strengths and limitations of current agentic approaches, offering actionable insights into their real-world applicability. By releasing FreshBrew publicly upon acceptance, we aim to facilitate rigorous, reproducible evaluation and catalyze progress in AI-driven codebase modernization.
View details
ALF: Advertiser Large Foundation Model for Multi-Modal Advertiser Understanding
Sunny Rajagopalan
Alireza Golestaneh
Shubhra Chandra
Min Zhou
Jonathan Vronsky
Songbai Yan
2026
Preview abstract
We present ALF (Advertiser Large Foundation model), a multi-modal transformer architecture for understanding advertiser behavior and intent across text, image, video and structured data modalities. Through contrastive learning and multi-task optimization, ALF creates unified advertiser representations that capture both content and behavioral patterns. Our model achieves state-of-the-art performance on critical tasks including fraud detection, policy violation identification, and advertiser similarity matching. In production deployment, ALF reduces false positives by 90\% while maintaining 99.8\% precision on abuse detection tasks. The architecture's effectiveness stems from its novel combination of multi-modal transformations, intersample attention mechanism, spectrally normalized projections, and calibrated probabilistic outputs.
View details
Preview abstract
For many practical applications of quantum computing, the slowest and most costly steps involve coherently accessing classical data. We help address this challenge by applying mass production techniques, which can sometimes allow us to perform operations many times in parallel for a cost that is comparable to a single execution[1-3]. We combine existing mass-production results with modern approaches for loading classical data using ``quantum read-only memory.'' We show that quantum mass production techniques offer no benefit when we consider a cost model that focuses purely on the number of non-Clifford gates. However, analyzing the constant factors in a more nuanced cost model, we find that it may be possible to obtain a reduction in cost of an order or magnitude or more for a variety reasonably-sized fault-tolerant quantum algorithms. We present several applications of quantum mass-production techniques beyond naive parallelization, including a strategy for reducing the cost of serial calls to the same data loading step.
View details
A Computer Vision Problem in Flatland
Erin Connelly
Annalisa Crannell
Timothy Duff
Rekha R. Thomas
SIAM Journal on Applied Algebra and Geometry, 10 (2026), pp. 14-45
Preview abstract
When is it possible to project two sets of labeled points of equal cardinality lying in a pair of projective planes to the same image on a projective line? We give a complete answer to this question, obtaining the following results. We first show that such a pair of projections exist if and only if the two point sets are themselves images of a common point set in projective space. Moreover, we find that for generic pairs of point sets, a common projection exists if and only if their cardinality is at most seven. In these cases, we give an explicit description of the loci of projection centers that enable a common image.
View details
Preview abstract
Semantic data models express high-level business concepts and metrics, capturing the business logic needed to query a database correctly. Most data modeling solutions are built as layers above SQL query engines, with bespoke query languages or APIs. The layered approach means that semantic models can’t be used directly in SQL queries. This paper focuses on an open problem in this space – can we define semantic models in SQL, and make them naturally queryable in SQL?
In parallel, graph query is becoming increasingly popular, including in SQL. SQL/PGQ extends SQL with an embedded subset of the GQL graph query language, adding property graph views and making graph traversal queries easy.
We explore a surprising connection: semantic data models are graphs, and defining graphs is a data modeling problem. In both domains, users start by defining a graph model, and need query language support to easily traverse edges in the graph, which means doing joins in the underlying data.
We propose some useful SQL extensions that make it easier to use higher-level data model abstractions in queries. Users can define a “semantic data graph” view of their data, encapsulating the complex business logic required to query the underlying tables correctly. Then they can query that semantic graph model easily with SQL.
Our SQL extensions are useful independently, simplifying many queries – particularly, queries with joins. We make declared foreign key relationships usable for joins at query time – a feature that seems obvious but is notably missing in standard SQL.
In combination, these extensions provide a practical approach to extend SQL incrementally, bringing semantic modeling and graph query together with the relational model and SQL.
View details
Preview abstract
How many T gates are needed to approximate an arbitrary n-qubit quantum state to within
a given precision ϵ? Improving prior work of Low, Kliuchnikov and Schaeffer, we show that the
optimal asymptotic scaling is Θ(sqrt{2^n log(1/ε)} + log(1/ε)) if we allow an unlimited number of ancilla qubits. We also show that this is the optimal T-count for implementing an arbitrary
diagonal n-qubit unitary to within error ϵ. We describe an application to batched synthesis of
single-qubit unitaries: we can approximate a tensor product of m = O(log log(1/ϵ)) arbitrary
single-qubit unitaries to within error ϵ with the same asymptotic T-count as is required to
approximate just one single-qubit unitary.
View details
CrossCheck: Input Validation for WAN Control Systems
Rishabh Iyer
Isaac Keslassy
Sylvia Ratnasamy
Networked Systems Design and Implementation (NSDI) (2026) (to appear)
Preview abstract
We present CrossCheck, a system that validates inputs to the Software-Defined Networking (SDN) controller in a Wide Area Network (WAN). By detecting incorrect inputs—often stemming from bugs in the SDN control infrastructure—CrossCheck alerts operators before they trigger network outages.
Our analysis at a large-scale WAN operator identifies invalid inputs as a leading cause of major outages, and we show how CrossCheck would have prevented those incidents. We deployed CrossCheck as a shadow validation system for four weeks in a production WAN, during which it accurately detected the single incident of invalid inputs that occurred while sustaining a 0% false positive rate under normal operation, hence imposing little additional burden on operators. In addition, we show through simulation that CrossCheck reliably detects a wide range of invalid inputs (e.g., detecting demand perturbations as small as 5% with 100% accuracy) and maintains a near-zero false positive rate for realistic levels of noisy, missing, or buggy telemetry data (e.g., sustaining zero false positives with up to 30% of corrupted telemetry data).
View details
`It’s still abuse’: Community attitudes and perceptions on AI-generated image-based sexual abuse
Nicola Henry
Gemma Beard
Lisa Given
Information, Communication, & Society (2026)
Preview abstract
There are growing concerns about AI-generated image-based sexual abuse (AI-IBSA), also known as nonconsensual sexualized ′deepfakes.′ Empirical research on AI-IBSA, however, remains very limited. This study surveyed 7231 respondents across Australia, the United Kingdom, and the United States to investigate community attitudes and perceptions on AI-IBSA. Through a vignette study, we explored the relationship between public familiarity with AI-IBSA, normative concerns about consent, and context-dependent judgments that vary based on the target's identity relational status, and how the content was used. Our findings reveal strong condemnation of AI-IBSA, yet respondents demonstrated low familiarity with the technology and their views varied depending on particular contexts. AI-IBSA targeting intimate partners was viewed as more unacceptable than targeting celebrities, and content created solely for personal use was seen as less unacceptable than content intended for distribution. The study highlights the need for approaches that go beyond technical fixes and punitive measures, advocating for a multifaceted response that integrates ethical data governance, digital sexual literacy, and restorative justice approaches.
View details
Preview abstract
Although sound information extraction appear distinct across spectrum of sound classes and technologies, all inherently involve creating some form of "embedding"—be it discrete as in textual tokens or continuous vectors—to encapsulate relevant information from the audio signal for downstream utilization. This unifying framework allows us to re-evaluate sound information extraction by researching the optimality of current task-specific representations, the quality headroom and the potential for a single, robust sound embedding to generalize across diverse applications and sound types. To expedite research in these directions, a standardized evaluation benchmark is indispensable, mirroring the established benchmarks in text and image domains. We present the Massive Sound Embedding Benchmark (MSEB) to serve this purpose. MSEB encompasses realistic tasks and datasets that reflect practical applications across diverse technologies and sound categories. Initial experimental findings indicate substantial headroom for enhancing prevalent information extraction methodologies. We encourage the sound processing community to contribute data and tasks to MSEB and employ it to assess their algorithms for improved overall sound encoding.
View details
Preview abstract
Text-to-image generative models have demonstrated great performance in generating realistic images. These generations are assumed to reflect a deep understanding of visual scenes. One interesting question is whether these models can possess a zero/few shot generalization capabilities that are known from humans. For example, a human can see an example of a new object and a word associated with this object, use their knowledge in a highly general way to recognize or imagine this novel object in a completely different setting or context. In this work, we are interested in testing whether text-image models can possess this same capability. In this work, we would like to test the hypothesis that text-to-image models may learn familiar objects better than novel objects. We use prompt tuning methods to learn those novel concepts while keeping the text-image models fixed. We prompt tune the model as well to learn familiar concepts, and evaluate the generalization ability for novel objects compared to familiar objects by running generation in different contexts/environments. In addition, instead of initializing the embedding vectors with some similar concepts, we use randomly initialized embedding vectors for both familiar and novel objects. Our human-survey evaluation results demonstrates that in some settings text-image models learn familiar objects better than novel objects.
View details
Balanced coupling in electromagnetic circuits
Juan Atalaya
Sergei Isakov
Physical Review Applied, 23 (2025), pp. 024012
Preview abstract
The rotating-wave approximation (RWA) is ubiquitous in the analysis of driven and coupled resonators. However, the limitations of the RWA seem to be poorly understood and in some cases the RWA disposes of essential physics. We investigate the RWA in the context of electrical circuits. Using a classical Hamiltonian approach, we find that by balancing electrical and magnetic components of the resonator drive or resonator-resonator coupling, the RWA can be made exact. This type of balance, in which the RWA is exact, has applications in superconducting qubits, where it suppresses nutation normally associated with strong Rabi driving. In the context of dispersive readout, balancing the qubit-resonator coupling changes the qubit leakage induced by the resonator drive but does not remove it in the case of the transmon qubit.
View details
Preview abstract
This note is a follow up to Ref. [Naaman, IEEE TAS 2025], describing how to construct Josephson junction, inductor, and mutual inductance models using components that are available in the Keysight ADS core library.
View details
Preview abstract
Recently, Chevignard et al proposed a way to factor $n$ bit RSA integers using only $(0.5 + \epsilon)n$ logical qubits.
In this paper, I streamline Chevignard's algorithm and estimate its physical cost accounting for the overhead of error correction.
I reduce its Toffoli count by more than 100x, and show that this implies a 2048 bit RSA integer could be factored in less than a week using less than one million noisy qubits (compared to 20 million in Gidney+Eker{\aa} 2019).
I make the same assumptions as in Gidney+Eker{\aa} 2019: a square grid of qubits with nearest neighbor connections, a gate error rate of $0.1\%$, a surface code cycle time of 1 microsecond, and a control system reaction time of $10$ microseconds.
View details
A Remote Sensing Vision-Language Foundation Model for Zero-Shot Tasks
Aviad Barzilai
Amr Helmy
Yotam Gigi
Vered Silverman
Yehonathan Refael
2025
Preview abstract
Foundation models have revolutionized AI, particularly in visual-language tasks, achieving unparalleled performance across domains. Despite advancements, remote sensing (RS) remains underserved due to the lack of large-scale image-text datasets. This paper addresses the gap by introducing two novel datasets: RS-WebLI and Google Maps, specifically designed for training remote sensing vision-language models (RS-VLMs).
The RS-WebLI dataset leverages web images filtered for RS relevance, enriched with high-quality captions derived from associated alt-text. The Google Maps dataset utilizes Gemini, a multi-modal large language model, to generate accurate and descriptive captions by aligning Google Maps data with high-resolution satellite and aerial imagery. These datasets together encompass a vast and diverse array of remote sensing objects and contexts, forming a robust foundation for RS-specific tasks. The two datasets together incorporate around 20M image and text pairs.
We fine-tuned Mammut, a state-of-the-art (SOTA) vision-language model, using these datasets. The model employs a contrastive learning framework, enabling robust zero-shot capabilities. Moreover, the Mammut architecture incorporates a generative loss component, further enhancing its adaptability. To evaluate the model’s zero-shot performance, we used two main methods. The first, zero-shot classification, tests the ability of the model to classify a remote sensing image into a pre-defined set of classes without training directly on the dataset. For this task we use the following RS image classification datasets: Functional Map of the World (FMOW), RESISC45, UCM Classification and SkyScript classification. For every dataset, we composed a set of sentences of the form ”An aerial image of class name”, and we used a simple nearest neighbor algorithm to find the best matching class for every image. The metric is the top-1 accuracy. The second evaluation method is zero-shot retrieval. For that task, we use the following remote sensing image-captions datasets: NWPU RESISC, UCM Captions, RSITMD and RSICD. Similarly to zero-shot classification, we use nearest neighbors on the model’s output embedding to match every image to a class. Similarly to other works in the field, we present the average of the top-1, top-5 and top-10 recall scores.
The study also evaluates supervised learning regimes, where the VLMs are fine-tuned on task-specific datasets like FMOW and FloodNet. These models outperform traditional masked-image models, showcasing the advantage of leveraging vision-language pre-training for RS applications. To assess generalization, the Google Maps Hold-out dataset was introduced, excluding specific object types during training. Results indicate the model's strong ability to recognize unseen objects, validating its versatility.
This work establishes a comprehensive framework for developing RS-VLMs, addressing dataset limitations and model scalability. It sets a precedent for leveraging foundation models in RS, paving the way for enhanced zero-shot and fine-tuned applications in remote sensing analytics. Future directions include expanding dataset diversity and exploring advanced architectures to further push the boundaries of RS vision-language understanding.
View details