Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 11321 publications
    SNPeek: Side-Channel Analysis for Privacy Applications on Confidential VMs
    Ruiyi Zhang
    Albert Cheu
    Adria Gascon
    Michael Schwarz
    Octavian Suciu
    Network and Distributed System Security (NDSS) (2026)
    Preview abstract Confidential virtual machines (CVMs) based on trusted execution environments (TEEs) enable new privacy-preserving solutions. But CVMs are not a privacy panacea, as they are vulnerable to side-channel attacks that may compromise confidentially of workloads. In this work, we develop the FARFETCH’D framework to help developers evaluate side-channel assisted privacy attacks that are broadly applicable to CVMs. The privacy reduction due to these attacks heavily depend on the execution environment and the workload, which varies vastly:What are avail-able attack primitives? How does the particular privacy work-load behave?This makes manual investigation and efficiently mitigating software-based side channels a cumbersome and impossible task. FARFETCH’D solves this challenge by providing a set of configurable attack primitives that can execute on real CVM hardware and automated ML-based analysis pipelines. We evaluate the effectiveness of FARFETCH’D on privacy-preserving workloads. Our results show that our approach is effective at pinpointing the vulnerability of privacy apps against side channels and help evaluating mitigation based on oblivious memory and differential privacy. View details
    Preview abstract Some artificial intelligence provisioning models that function as tools for human users or rely on labor arbitrage can present challenges for organizations, such as managing personnel rather than task outcomes and introducing data security risks. An architecture is described for an outcome-based synthetic labor market in which autonomous computational agents can be compensated based on verified task completion. The framework can leverage trusted execution environments to create secure hardware enclaves for processing sensitive data, which can render the data cryptographically inaccessible to a host system or agent provider. This approach can facilitate a secure, transactional market for autonomous professional execution, which may enable a shift from managing labor resources to procuring verified outcomes from a pool of specialized agents. View details
    LiveSVG: Zero-Shot SVG Animation via Video Generation
    Matan Levy
    Ran Margolin
    Bar Cavia
    Dvir Samuel
    Shmuel Peleg
    Alex Rav Acha
    Arik Shamir
    Dani Lischinski
    Google (2026)
    Preview abstract We introduce LiveSVG, a zero-shot approach for generating Scalable Vector Graphics (SVG) animations using video diffusion models. Current SVG animation methods struggle with complex motions: LLM-based code synthesis fails to express fine, non-rigid Bézier deformations, while Score Distillation Sampling (SDS) provides noisy gradients and often requires category-specific priors like skeletons. In contrast, LiveSVG fits vector geometry directly to an explicitly generated target video. Given an input SVG image and a motion prompt, we generate a previewable target video using a frozen image-to-video model, then fit the original SVG to this video via differentiable rendering. Our fitting stage is skeleton-free, utilizing a dual-level motion representation that combines per-group homographies for coarse articulation with per-path Bézier control-point offsets for local deformations. To resolve color-induced correspondence ambiguities during pixel-wise fitting, we introduce a novel sphere-packing recolorization strategy. We also present ChallengeSVG, a benchmark of complex, multi-object scenes that exposes the limitations of prior work. Evaluations demonstrate that LiveSVG significantly outperforms existing methods on both AniClipart and ChallengeSVG, establishing direct reference-video fitting as a practical, robust route to prompt-aligned and fully editable vector animation. View details
    Preview abstract When managing complex, unpredictable (non-deterministic) AI agents using simple, fixed control systems (like finite state machines), operational failures and accountability issues often arise. This document introduces a probabilistic governance and telemetry framework to resolve these problems. Instead of following a rigid sequence of steps, this framework defines a multi-dimensional operational boundary, a 'behavioral volume', and assigns the agent a goal. This allows the agent to use its own reasoning to achieve the goal while remaining within the defined boundaries. A separate telemetry layer monitors the agent's actions by calculating metrics, such as alignment scores and drift velocity, to measure how much the agent deviates from its intended behavior. This system provides a method for guiding, monitoring, and securing autonomous agents, effectively managing the performance and security of an unpredictable AI workforce in complex environments. View details
    Preview abstract This disclosure describes systems and methods for a multi-agent framework that can automate and scale cognitive work. The framework can, for example, use a cognitive assembly line of specialized computational agents to perform tasks such as research and drafting. A beneficial component could be an adversarial review panel (ARP), which is a multi-agent review system where distinct agent personas critique a generated draft from varied perspectives. The structured feedback from the ARP can be used to automatically iterate on and refine the work product. This approach can improve the intellectual rigor of generated content and reduce the time required for production, which may allow human operators to focus on activities such as strategic oversight and final validation. View details
    Preview abstract Large Language Models utilizing reasoning techniques improve task performance but incur significant latency and token costs due to verbose generation. Existing automatic prompt optimization(APO) frameworks target task accuracy exclusively at the expense of generating long reasoning traces. We propose Cost-Regularized Optimization of Prompts (CROP), an APO method that introduces regularization on response length by generating textual feedback in addition to standard accuracy feedback. This forces the optimization process to produce prompts that elicit concise responses containing only critical information and reasoning. We evaluate our approach on complex reasoning datasets, specifically GSM8K, LogiQA and BIG-Bench Hard. We achieved an 80.6% reduction in token consumption while maintaining competitive accuracy, seeing only a nominal decline in performance. This presents a pragmatic solution for deploying token-efficient and cost-effective agentic AI systems in production pipelines. View details
    Preview abstract We introduce KVCIS (KV-Cache Importance Scoring), a novel approach to KV-cache compression that predicts token importance from intermediate-layer activations before attention is computed. Unlike existing methods (H2O, StreamingLLM, Scissorhands) that make compression decisions based on attention scores computed during generation, KVCIS enables proactive compression at cache insertion time—determining how to store each token before paying the computational cost of attention. We discover a two-level importance structure in decoder-only transformers: the beginning-of-sequence (BOS) token acts as an "attention sink" receiving ~76% of attention, while the remaining ~24% is distributed across content tokens with 10-11× importance spread. A simple linear probe achieves R² = 0.998 overall and R² = 0.68–0.79 for discriminating among content tokens. Extensive validation across 3 model families (Llama, Mistral, Gemma), 8 layer depths, context lengths from 256 to 2048 tokens, and multiple downstream tasks demonstrates: 50% memory reduction with zero degradation on NarrativeQA (F1 = 0.064 matching baseline exactly), while uniform quantization degrades by 7.8% at the same compression ratio. KVCIS consistently achieves 5–8× better quality preservation than uniform quantization across all tested context lengths. The memory savings enable increased batch sizes and longer context support; the probe itself adds minimal overhead (~16KB direction vector, 0.06ms per token). This work extends activation-based probing from safety classification to inference optimization, demonstrating that intermediate-layer activations encode predictive signals about token importance for generation. View details
    Gaze Target Estimation Anywhere with Concepts
    Xu Cao
    Houze Yang
    Vipin Gunda
    Inki Kim
    Jim Rehg
    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2026)
    Preview abstract Estimating human gaze targets in-the-wild is a formidable challenge. Existing computer vision algorithms rely on brittle, multi-stage pipelines that require explicit inputs like head bounding boxes and human pose, causing initial detection errors to cascade and lead to system failure. To overcome this, we introduce the \textbf{Promptable Gaze Target Estimation (PGE)} task, a new end-to-end, concept-driven paradigm. PGE conditions gaze prediction on flexible user text or visual prompts (e.g., "the boy in the red shirt" or "person in point [0.52, 0.48]") to identify a specific subject's target, which eliminates the rigid dependency on intermediate localization cues. We develop a scalable data engine to generate \textbf{Gaze-Co}, a dataset and benchmark of 120K high-quality, prompt-annotated image pairs. We also propose \textbf{AnyGaze}, the first model designed for PGE. AnyGaze uses a Transformer-based detector to fuse features from frozen encoders and simultaneously solves subject localization, in/out-of-frame presence, and gaze target heatmap estimation. AnyGaze achieves state-of-the-art performance on standard gaze target estimation benchmarks, setting a strong baseline for this new problem even on a difficult out-of-domain, real-world clinical dataset. We will open-source the AnyGaze model and the Gaze-Co benchmark. View details
    Preview abstract This whitepaper seeks to elucidate implications that the capabilities of developing quantum architectures have on blockchain vulnerabilities and mitigation strategies. First, we provide new resource estimates for breaking the 256-bit Elliptic Curve Discrete Logarithm Problem, the core of modern blockchain cryptography. We demonstrate that Shor's algorithm for this problem can execute with either <1200 logical qubits and <90 million Toffoli gates or <1450 logical qubits and <70 million Toffoli gates. In the interest of responsible disclosure, we use a zero-knowledge proof to validate these results without disclosing attack vectors. On superconducting architectures with 1e-3 physical error rates and planar connectivity, those circuits can execute in minutes using fewer than half a million physical qubits. We introduce a critical distinction between fast-clock (such as superconducting and photonic) and slow-clock (such as neutral atom and ion trap) architectures. Our analysis reveals that the first fast-clock CRQCs would enable on-spend attacks on public mempool transactions of some cryptocurrencies. We survey major cryptocurrency vulnerabilities through this lens, identifying systemic risks associated with advanced features in some blockchains such as smart contracts, Proof-of-Stake consensus, and Data Availability Sampling, as well as the enduring concern of abandoned assets. We argue that technical solutions would benefit from accompanying public policy and discuss various frameworks of digital salvage to regulate the recovery or destruction of dormant assets while preventing adversarial seizure. We also discuss implications for other digital assets and tokenization as well as challenges and successful examples of the ongoing transition to Post-Quantum Cryptography (PQC). Finally, we urge all vulnerable cryptocurrency communities to join the ongoing migration to PQC without delay. View details
    Expert evaluation of LLM world models: A high-Tc superconductivity case study
    Haoyu Guo
    Maria Tikhanovskaya
    Paul Raccuglia
    Alexey Vlaskin
    Chris Co
    Scott Ellsworth
    Matthew Abraham
    Lizzie Dorfman
    Peter Armitage
    Chunhan Feng
    Antoine Georges
    Olivier Gingras
    Dominik Kiese
    Steve Kivelson
    Vadim Oganesyan
    Brad Ramshaw
    Subir Sachdev
    Senthil Todadri
    John Tranquada
    Eun-Ah Kim
    Proceedings of the National Academy of Sciences (2026)
    Preview abstract Large Language Models (LLMs) show great promise as a powerful tool for scientific literature exploration. However, their effectiveness in providing scientifically accurate and comprehensive answers to complex questions within specialized domains remains an active area of research. This work evaluates the performance of six different LLM-based systems for answering scientific literature questions, including commercially available closed models and a custom retrieval-augmented generation (RAG) system capable of retrieving images alongside text. We conduct a rigorous expert evaluation of the systems in the domain of high-temperature cuprate superconductors, a research area that involves material science, experimental physics, computation, and theoretical physics. We use an expert-curated database of 1726 scientific papers and a set of 67 expert-formulated questions. The evaluation employs a multi-faceted rubric assessing balanced perspectives, factual comprehensiveness, succinctness, evidentiary support, and image relevance. Our results demonstrate that RAG-based systems, powered by curated data and multimodal retrieval, outperform existing closed models across key metrics, particularly in providing comprehensive and well-supported answers, and in retrieving relevant visual information. This study provides valuable insights into designing and evaluating specialized scientific literature understanding systems, particularly with expert involvement, while also highlighting the importance of rich, domain-specific data in such systems. View details
    Preview abstract Contrail microphysical simulations and climate simulations have indicated that contrail cirrus cause a substantial fraction of aviation’s climate impact. While the approximations and parameter selections in these simulations have been well-validated over the past two decades, the heat trapping of contrails has not been observed using satellite data beyond a few hours. This is because contrails lose their linear shape after a few hours, making them difficult to distinguish from natural cirrus clouds. Here we provide satellite-driven analysis of long-lived heat trapping by contrails over North and South America. We aggregate a dataset of GOES-16 estimated outgoing longwave radiation and advected trace density of flight paths, and apply causal inference to discern the effect of contrails while controlling for radiative and cloud confounders. As a means of validation, we also generate synthetic datasets with known ground truth, and confirm that applying the causal inference method is able to recover the synthetic ground truth. Since this method yields an estimate which has some differences from both “instantaneous radiative forcing” (iRF) and “effective radiative forcing” (ERF) estimates which have been reported in the literature so far, we introduce the new term “observational radiative forcing, 12 hours” (oRF12). Our analysis estimates the longwave oRF12 from contrails over the Americas averaged 47.9 gigajoules per flight kilometer (95% CI: 31 to 52 GJ/km) during April 2019 to April 2020. View details
    Managing and Securing Google's Fleet of Multi-Node Servers
    Richard Hanley
    Havard Skinnemoen
    Andrés Lagar-Cavilla
    Michael Wong
    Jon McCune
    Jeff Andersen
    Kishan Prasad
    Patrick Leis
    Shiva Rao
    Chris Koch
    Jad Baydoun
    Anna Sapek
    Communications of the ACM, 69:3 (2026), pp. 82 - 92
    Preview abstract Server hardware and software co-design for a secure, efficient cloud. View details
    Marginalized Bundle Adjustment: Multi-View Camera Pose from Monocular Depth Estimates
    Shengjie Zhu
    Xiaoming Liu
    Vincent Chu
    International Conference on 3D Vision (2026)
    Preview abstract Structure-from-Motion (SfM) is a classical 3D vision task for recovering camera parameters and scene geometry from multi-view images. Recent advances in deep learning enable accurate monocular depth estimation (MDE) that infers structure from a single image without depending on camera motion. But integrating MDE into SfM remains challenging. Unlike classical triangulated sparse pointclouds, MDE produces dense depthmaps with significantly higher error variance. Inspired by modern RANSAC estimators, we propose a Marginalized Bundle Adjustment (MBA) to accommodate MDE error variance with its density. With MBA, we show that MDE depthmaps are sufficiently accurate to support SoTA or competitive results in Structure-from-Motion and camera relocalization. Our benchmark demonstrates consistent remarkable results from two-view, few-frames small multiview, to thousands-frames large multiview system. Our method highlights the significant potential of MDE on multi-view 3D vision tasks. View details
    Improved Differentially Private Algorithms for Rank Aggregation
    Phanu Vajanopath
    Quentin Hillebrand
    Vorapong Suppakitpaisarn
    AAAI (2026)
    Preview abstract Rank aggregation is a task of combining the rankings of items from multiple users into a single ranking that best represents the users' rankings. Alabi et al. (AAAI'22) presents differentially-private (DP) polynomial-time approximation schemes (PTASes) and 5-approximation algorithms with certain additive errors for the Kemeny rank aggregation problem in both central and local models. In this paper, we present improved DP PTASes with smaller additive error in the central model. Furthermore, we are first to study the footrule rank aggregation problem under DP. We give a near-optimal algorithm for this problem; as a corollary, this leads to 2-approximation algorithms with the same additive error as the 5-approximation algorithms of Alabi et al. for the Kemeny rank aggregation problem in both central and local models. View details
    Preview abstract The remarkable success of Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) in 2D computer vision has catalyzed significant research into their adaptation for the complex domain of 3D analysis. However, a fundamental dichotomy exists between the regular, dense grid of 2D images and the irregular, sparse nature of 3D data formats such as point clouds and meshes. This paper provides a comprehensive survey and a novel intellectual framework for navigating this burgeoning field. Our core contribution is a new taxonomy that organizes adaptation strategies into three distinct families: (1) Data-centric methods, which project 3D data into 2D formats to leverage off-the-shelf 2D models; (2) Architecture-centric methods, which design intrinsic network modules to directly process 3D data; and (3) Hybrid methods, which synergistically combine pre-trained 2D features with 3D modeling processing pipelines to benefit from both rich visual priors and explicit geometric reasoning. Through this taxonomic lens, we conduct a systematic review and qualitative synthesis of the field. We illuminate the fundamental trade-offs between these families concerning computational complexity, reliance on large-scale pre-training, and the preservation of geometric inductive biases. Based on this analysis, we identify and discuss critical open challenges and chart promising future research directions, including the development of 3D foundation models, advancements in self-supervised learning for geometric data, and the deeper integration of multi-modal signals. This survey serves as an essential resource and roadmap for researchers seeking to understand and advance the state-of-the-art in 3D computer vision. View details
    ×