Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 11202 publications
Managing and Securing Google's Fleet of Multi-Node Servers
Richard Hanley
Havard Skinnemoen
Andrés Lagar-Cavilla
Michael Wong
Jeff Andersen
Kishan Prasad
Patrick Leis
Shiva Rao
Chris Koch
Jad Baydoun
Anna Sapek
Communications of the ACM, 69:3 (2026), pp. 82 - 92
Preview abstract
Server hardware and software co-design for a secure, efficient cloud.
View details
Usability Hasn’t Peaked: Exploring How Expressive Design Overcomes the Usability Plateau
Alyssa Sheehan
Bianca Gallardo
Ying Wang
Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI ’26), April 13–17, 2026, Barcelona, Spain (2026)
Preview abstract
Critics have argued that mobile usability has largely been optimized, and that only incremental gains are possible. We set out to explore if the newest generation of design systems, which promote greater flexibility and a return to design basics, could produce substantially more usable designs while maintaining or increasing aesthetic judgments. Through a study with 48 diverse participants completing tasks in 10 different applications, we found that in designs created following Material 3 Expressive guidelines, users fixated on the correct screen element for a task 33% faster, completed tasks 20% faster, and rated experiences more positively compared to versions designed using the previous Material design system. These improvements in performance and aesthetic ratings challenge the premise of a usability plateau and show that mobile usability has not peaked. We illustrate specific opportunities to make mobile experiences more usable by returning to design fundamentals while highlighting risks of added flexibility.
View details
Who Controls the Curriculum for AI? The Limits of Participatory Design for Educational AI
Michael Madaio
Learning Under Algorithmic Conditions, University of Minnesota Press (2026)
Preview abstract
Participatory design is a long-standing effort to shift control over technology design from technologists to users and communities impacted by technologies. For educational AI, this means involving students, families, teachers, and other stakeholders in shaping the design of AI systems. While promising, in this article, I situate the recent calls for participatory design of educational AI systems within a different historical tradition—that of contests over local control of educational curricula. I argue that approaches that attempt to steer the design and development of educational AI through participatory methods may inadvertently reproduce the history of political contestation of educational curricula, in ways that may privilege the most powerful communities, rather than those inequitably impacted. What might it look like to treat participatory AI design as a site for political contestation? How might these approaches avoid reproducing the same majoritarian tendencies that led to educational inequities in the first place?
View details
KVCIS: Activation-Based Token Importance Prediction for Intelligent KV-Cache Compression
Zenodo (2026)
Preview abstract
We introduce KVCIS (KV-Cache Importance Scoring), a novel approach to KV-cache compression that predicts token importance from intermediate-layer activations before attention is computed. Unlike existing methods (H2O, StreamingLLM, Scissorhands) that make compression decisions based on attention scores computed during generation, KVCIS enables proactive compression at cache insertion time—determining how to store each token before paying the computational cost of attention. We discover a two-level importance structure in decoder-only transformers: the beginning-of-sequence (BOS) token acts as an "attention sink" receiving ~76% of attention, while the remaining ~24% is distributed across content tokens with 10-11× importance spread. A simple linear probe achieves R² = 0.998 overall and R² = 0.68–0.79 for discriminating among content tokens. Extensive validation across 3 model families (Llama, Mistral, Gemma), 8 layer depths, context lengths from 256 to 2048 tokens, and multiple downstream tasks demonstrates: 50% memory reduction with zero degradation on NarrativeQA (F1 = 0.064 matching baseline exactly), while uniform quantization degrades by 7.8% at the same compression ratio. KVCIS consistently achieves 5–8× better quality preservation than uniform quantization across all tested context lengths. The memory savings enable increased batch sizes and longer context support; the probe itself adds minimal overhead (~16KB direction vector, 0.06ms per token). This work extends activation-based probing from safety classification to inference optimization, demonstrating that intermediate-layer activations encode predictive signals about token importance for generation.
View details
Preview abstract
Semantic data models express high-level business concepts and metrics, capturing the business logic needed to query a database correctly. Most data modeling solutions are built as layers above SQL query engines, with bespoke query languages or APIs. The layered approach means that semantic models can’t be used directly in SQL queries. This paper focuses on an open problem in this space – can we define semantic models in SQL, and make them naturally queryable in SQL?
In parallel, graph query is becoming increasingly popular, including in SQL. SQL/PGQ extends SQL with an embedded subset of the GQL graph query language, adding property graph views and making graph traversal queries easy.
We explore a surprising connection: semantic data models are graphs, and defining graphs is a data modeling problem. In both domains, users start by defining a graph model, and need query language support to easily traverse edges in the graph, which means doing joins in the underlying data.
We propose some useful SQL extensions that make it easier to use higher-level data model abstractions in queries. Users can define a “semantic data graph” view of their data, encapsulating the complex business logic required to query the underlying tables correctly. Then they can query that semantic graph model easily with SQL.
Our SQL extensions are useful independently, simplifying many queries – particularly, queries with joins. We make declared foreign key relationships usable for joins at query time – a feature that seems obvious but is notably missing in standard SQL.
In combination, these extensions provide a practical approach to extend SQL incrementally, bringing semantic modeling and graph query together with the relational model and SQL.
View details
Exponential quantum advantage in processing massive classical data
Haimeng Zhao
Alexander Zlokapa
John Preskill
Hsin-Yuan (Robert) Huang
arXiv:2604.07639 (2026)
Preview abstract
Broadly applicable quantum advantage, particularly in classical data processing and machine learning, has been a fundamental open problem. In this work, we prove that a small quantum computer of polylogarithmic size can perform large-scale classification and dimension reduction on massive classical data by processing samples on the fly, whereas any classical machine achieving the same prediction performance requires exponentially larger size. Furthermore, classical machines that are exponentially larger yet below the required size need superpolynomially more samples and time. We validate these quantum advantages in real-world applications, including single-cell RNA sequencing and movie review sentiment analysis, demonstrating four to six orders of magnitude reduction in size with fewer than 60 logical qubits. These quantum advantages are enabled by quantum oracle sketching, an algorithm for accessing the classical world in quantum superposition using only random classical data samples. Combined with classical shadows, our algorithm circumvents the data loading and readout bottleneck to construct succinct classical models from massive classical data, a task provably impossible for any classical machine that is not exponentially larger than the quantum machine. These quantum advantages persist even when classical machines are granted unlimited time or if BPP=BQP, and rely only on the correctness of quantum mechanics. Together, our results establish machine learning on classical data as a broad and natural domain of quantum advantage and a fundamental test of quantum mechanics at the complexity frontier.
View details
Securing Elliptic Curve Cryptocurrencies against Quantum Vulnerabilities: Resource Estimates and Mitigations
Michael Broughton
Thiago Bergamaschi
Justin Drake
Dan Boneh
arXiv:2603.28846 (2026)
Preview abstract
This whitepaper seeks to elucidate implications that the capabilities of developing quantum architectures have on blockchain vulnerabilities and mitigation strategies. First, we provide new resource estimates for breaking the 256-bit Elliptic Curve Discrete Logarithm Problem, the core of modern blockchain cryptography. We demonstrate that Shor's algorithm for this problem can execute with either <1200 logical qubits and <90 million Toffoli gates or <1450 logical qubits and <70 million Toffoli gates. In the interest of responsible disclosure, we use a zero-knowledge proof to validate these results without disclosing attack vectors. On superconducting architectures with 1e-3 physical error rates and planar connectivity, those circuits can execute in minutes using fewer than half a million physical qubits. We introduce a critical distinction between fast-clock (such as superconducting and photonic) and slow-clock (such as neutral atom and ion trap) architectures. Our analysis reveals that the first fast-clock CRQCs would enable on-spend attacks on public mempool transactions of some cryptocurrencies. We survey major cryptocurrency vulnerabilities through this lens, identifying systemic risks associated with advanced features in some blockchains such as smart contracts, Proof-of-Stake consensus, and Data Availability Sampling, as well as the enduring concern of abandoned assets. We argue that technical solutions would benefit from accompanying public policy and discuss various frameworks of digital salvage to regulate the recovery or destruction of dormant assets while preventing adversarial seizure. We also discuss implications for other digital assets and tokenization as well as challenges and successful examples of the ongoing transition to Post-Quantum Cryptography (PQC). Finally, we urge all vulnerable cryptocurrency communities to join the ongoing migration to PQC without delay.
View details
Agentic AI Infrastructure in Practice: Learn These Key Hurdles to Deploy Production AI Agents Efficiently
https://swisscognitive.ch/ (2026)
Preview abstract
The emergence of Agentic AI—autonomous systems capable of reasoning, decision-making, and multi-step execution—represents a paradigm shift in enterprise technology. Moving beyond simple generative tasks, these agents offer the potential to solve long-standing industry pain points, with over 90% of enterprises planning integration within the next three years. However, the transition from successful proof-of-concept (PoC) to a resilient, production-grade system presents significant hurdles.
This article categorizes these challenges into three primary domains:
Technical and Engineering Hurdles: Issues such as "entangled workflows" that complicate debugging, the struggle to maintain output quality and mitigate hallucinations, and the unpredictability caused by shifting underlying models or data sources.
People, Process, and Ecosystem Hurdles: The high operational costs and unclear ROI of large models, the necessity of a new "Agent Ops" skillset, the complexity of integrating agents with disparate enterprise systems, and a rapidly evolving regulatory landscape.
The Pace of Change and Security risks: The technical debt incurred by shifting software frameworks and the expanded attack surface created by autonomous agents.
The article concludes that successful deployment requires a shift from informal "vibe-testing" to rigorous engineering discipline. By adopting code-first frameworks, establishing robust evaluation metrics (KPIs), and prioritizing functional deployment over theoretical optimization, organizations can effectively manage the lifecycle of Agentic AI and realize its transformative business value.
View details
A Computer Vision Problem in Flatland
Erin Connelly
Annalisa Crannell
Timothy Duff
Rekha R. Thomas
SIAM Journal on Applied Algebra and Geometry, 10 (2026), pp. 14-45
Preview abstract
When is it possible to project two sets of labeled points of equal cardinality lying in a pair of projective planes to the same image on a projective line? We give a complete answer to this question, obtaining the following results. We first show that such a pair of projections exist if and only if the two point sets are themselves images of a common point set in projective space. Moreover, we find that for generic pairs of point sets, a common projection exists if and only if their cardinality is at most seven. In these cases, we give an explicit description of the loci of projection centers that enable a common image.
View details
Preview abstract
Large language models (LLMs) are trained on web-scale corpora that exhibit steep power-law distributions, in which the distribution of knowledge is highly long-tailed, with most appearing infrequently. While scaling has improved average-case performance, persistent failures on low-frequency, domain-specific, cultural, and temporal knowledge remain poorly characterized. This paper develops a structured taxonomy and analysis of long-tail knowledge in large language models, synthesizing prior work across technical and sociotechnical perspectives.
We organize the literature along four complementary axes: how long-tail knowledge is defined, the mechanisms by which it is lost or distorted during training and inference, the technical interventions proposed to mitigate these failures, and the implications of these failures for fairness, accountability, transparency, and user trust. We further examine how existing evaluation practices obscure tail behavior and complicate accountability for rare but consequential failures. The paper concludes by identifying open challenges related to privacy, sustainability, and governance that constrain long-tail knowledge representation. Taken together, this paper provides a unifying conceptual framework for understanding how long-tail knowledge is defined, lost, evaluated, and manifested in deployed language model systems.
View details
Gaze Target Estimation Anywhere with Concepts
Xu Cao
Houze Yang
Vipin Gunda
Inki Kim
Jim Rehg
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2026)
Preview abstract
Estimating human gaze targets in-the-wild is a formidable challenge. Existing computer vision algorithms rely on brittle, multi-stage pipelines that require explicit inputs like head bounding boxes and human pose, causing initial detection errors to cascade and lead to system failure. To overcome this, we introduce the \textbf{Promptable Gaze Target Estimation (PGE)} task, a new end-to-end, concept-driven paradigm. PGE conditions gaze prediction on flexible user text or visual prompts (e.g., "the boy in the red shirt" or "person in point [0.52, 0.48]") to identify a specific subject's target, which eliminates the rigid dependency on intermediate localization cues. We develop a scalable data engine to generate \textbf{Gaze-Co}, a dataset and benchmark of 120K high-quality, prompt-annotated image pairs. We also propose \textbf{AnyGaze}, the first model designed for PGE. AnyGaze uses a Transformer-based detector to fuse features from frozen encoders and simultaneously solves subject localization, in/out-of-frame presence, and gaze target heatmap estimation. AnyGaze achieves state-of-the-art performance on standard gaze target estimation benchmarks, setting a strong baseline for this new problem even on a difficult out-of-domain, real-world clinical dataset. We will open-source the AnyGaze model and the Gaze-Co benchmark.
View details
Preview abstract
Enterprise service delivery platforms, while vital for HR operations, create significant challenges in managing the risks of Personally Identifiable Information (PII) exposure. The integration of Generative AI offers new efficiencies but also amplifies these risks. Existing solutions—ranging from manual redaction and rule-based Data Loss Prevention (DLP) to inflexible data masking—fail to provide a nuanced, integrated approach. This paper introduces the Dual-Mode Privacy Guard (DMPG), a conceptual framework that establishes a model for Augmented Compliance. The framework provides a "defense-in-depth" strategy built on three pillars: (1) a Zero-Trust AI Foundation leveraging a verifiable, non-retention API gateway to ensure data privacy; (2) a proactive "Guardrail" that uses AI to detect and flag potential PII for human-in-the-loop review; and (3) an on-demand "Tool" that allows users to create securely anonymized data assets. By differentiating between proactive monitoring and reactive utility, the DMPG shifts the compliance paradigm from a manual burden to an AI-assisted process that enhances, rather than replaces, human oversight. This paper details the framework’s platform-agnostic architecture, using Salesforce as a reference implementation, and argues for its novelty as a model for operationalizing privacy principles within modern enterprise systems.
View details
Preview abstract
Validating conversational artificial intelligence (AI) for regulated medical software applications may present challenges, as static test datasets and manual review may be limited in identifying emergent, conversational anomalies. A multi-agent AI system may be configured in a closed-loop for automated validation. The system can, for example, utilize an end user persona simulator agent to generate prompts for a target model and a domain /regulatory expert adjudicator agent to evaluate the target model’s responses against a configurable rubric. A meta-analysis agent can analyze anomalies to identify underlying vulnerabilities, which may then be used to programmatically synthesize new adversarial personas. This adaptive process can generate evidence to support regulatory compliance and continuous performance monitoring for medical software algorithms systems.
View details
Preview abstract
In some multi-stage software build pipelines, downstream compiler errors may be reported against ephemeral, machine-generated intermediate artifacts rather than original, human-written source code, which can make remediation challenging. A system and method may address this by intercepting a downstream error, mapping its location back to the original source file, and programmatically injecting a dormant suppression tag into the original source code. During a subsequent build, an intermediate transpiler can propagate this tag into a newly generated intermediate artifact. In the intermediate file, the tag may become active and be recognized by the downstream compiler as a directive to suppress the specific error. This approach can facilitate an automated remediation process for certain build failures that avoids direct modification of ephemeral files and uses the original source code as a record for suppression.
View details
Preview abstract
Managing compiler build errors that can arise during infrastructure upgrades in large, polyglot codebases may be challenging, as manual remediation can be slow and some automated tools may not support modern language syntax. A system can provide automated error remediation by ingesting compiler diagnostics and analyzing source code using an Abstract Syntax Tree (AST). A recursive scope resolution algorithm, for example, can traverse the AST to identify a specific and narrowly-scoped code block at which to apply an error suppression. Conversely, this algorithmic complexity can be bypassed when lexical scope resolution is not required, and the system can identify the specific location of error suppressions directly from the error's exact coordinates. The system may then generate and apply language-specific patches, such as structured comments for JavaScript source files or line-scoped comments for TypeScript source files, for example, by using a transactional rewrite engine. This approach can provide a scalable method for managing automated code remediation, which may facilitate infrastructure upgrades by reducing the need for manual intervention.
View details