Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 11318 publications
    LiveSVG: Zero-Shot SVG Animation via Video Generation
    Matan Levy
    Ran Margolin
    Bar Cavia
    Dvir Samuel
    Shmuel Peleg
    Alex Rav Acha
    Arik Shamir
    Dani Lischinski
    Google (2026)
    Preview abstract We introduce LiveSVG, a zero-shot approach for generating Scalable Vector Graphics (SVG) animations using video diffusion models. Current SVG animation methods struggle with complex motions: LLM-based code synthesis fails to express fine, non-rigid Bézier deformations, while Score Distillation Sampling (SDS) provides noisy gradients and often requires category-specific priors like skeletons. In contrast, LiveSVG fits vector geometry directly to an explicitly generated target video. Given an input SVG image and a motion prompt, we generate a previewable target video using a frozen image-to-video model, then fit the original SVG to this video via differentiable rendering. Our fitting stage is skeleton-free, utilizing a dual-level motion representation that combines per-group homographies for coarse articulation with per-path Bézier control-point offsets for local deformations. To resolve color-induced correspondence ambiguities during pixel-wise fitting, we introduce a novel sphere-packing recolorization strategy. We also present ChallengeSVG, a benchmark of complex, multi-object scenes that exposes the limitations of prior work. Evaluations demonstrate that LiveSVG significantly outperforms existing methods on both AniClipart and ChallengeSVG, establishing direct reference-video fitting as a practical, robust route to prompt-aligned and fully editable vector animation. View details
    Preview abstract Advanced reasoning typically requires Chain-of-Thought prompting, which is accurate but incurs prohibitive latency and substantial test-time inference costs. The standard alternative, fine-tuning smaller models, often sacrifices interpretability while introducing significant resource and operational overhead. To address these limitations, we introduce Prompt-Level Distillation (PLD). We extract explicit reasoning patterns from a Teacher model and organize them into a structured list of expressive instructions for the Student model's System Prompt. Evaluated on the StereoSet and Contract-NLI datasets using Gemma-3 4B, PLD improved Macro F1 scores from 57\% to 90.0\% and 67\% to 83\% respectively, enabling this compact model to match frontier performance with negligible latency overhead. These expressive instructions render the decision-making process transparent, allowing for full human verification of logic, making this approach ideal for regulated industries such as law, finance, and content moderation, as well as high-volume use cases and edge devices. View details
    Unveiling the Global Landscape of Android Security Updates
    Haiyun Deng
    Abbas Acar
    Esteban Luques
    Harun Oz
    Ahmet Aris
    Selcuk Uluagac
    IEEE Transactions on Dependable and Secure Computing (2026)
    Preview abstract Android is the world’s leading mobile operating system, with over three billion active devices. Detecting vulnerabilities and ensuring timely patch deployment are critical to maintaining security. The Android Open Source Project (AOSP) has enhanced the transparency of security updates through Security Patch Levels. However, challenges related to update speed and availability persist. In 2022, Google reported that half of the zero-day vulnerabilities discovered in the wild were variations of vulnerabilities that had already been patched. Recent research mainly highlights delays in update distribution, often attributing them to fragmentation and focusing primarily on flagship devices or limited time-frames. Our approach takes a device-centric perspective to investigate Android update patterns, analyzing 567K security update records from 2014 to 2024, covering 904 distinct devices from six key Original Equipment Manufacturers (OEMs) across 98 countries. Our extensive analysis revealed notable differences in update release timing across OEMs, device types, and regions. Our study also examines documented vulnerabilities and weaknesses, while assessing OEM compliance with Android security guidelines. Our study shows that ∼89.7% of vulnerabilities on unpatched Android devices are exploitable without user interaction and with low attack complexity. We also identified delays linked to fragmentation and OEM-specific challenges, and provide actionable insights for improvement. View details
    Preview abstract Optical health sensing algorithms, such as SpO2, sleep monitoring, and metabolic health sensing, critically depend on the accurate measurement of optical emission from Light Emitting Diodes (LEDs) transmitted through user tissue and detected by a photodiode (PD). A significant challenge to the reliability of these measurements is the inherent degradation of LED optical emission intensity over time due to device aging. This degradation can confound the physiological changes being monitored. Our work quantifies the impact of LED aging on sensor signal integrity, specifically examining the Current Transfer Ratio (CTR), which is a key metric defining the ratio of received photocurrent to the LED drive current used for transmission in various health sensing algorithms. We investigate the degradation characteristics across LEDs of different wavelengths. Our findings indicate a relative CTR change due to degradation ranging from 1% to 8% within 100 hours of continuous operation which translates to approximately 3.5 to 7 years of device lifetime. Furthermore, we explore the non-linearity of this degradation and the observed initial ”overshoot” phenomenon in the CTR during aging. We discuss how understanding these dynamics could inform the development of robust specifications for different physiological sensing algorithms. Finally, we present several potential solutions to mitigate the effects of LED aging. During the product design phase, integrating a calibrating photodiode or compensating circuitry around the LED can help preemptively address degradation. In the application space, run-time calibration strategies employing two differently degraded optical paths offer a promising approach to maintain measurement accuracy. View details
    GenAI on Google Cloud: Enterprise Generative AI Systems and AI Agents
    Ayo Adedeji
    Lavi Nigam
    Stephanie Gervasi
    O'Reilly Media, Inc. (2026)
    Preview abstract In today's AI landscape, success depends not just on prompting large language models but on orchestrating them into intelligent systems that are scalable, compliant, and cost-effective. GenAI on Google Cloud is your hands-on guide to bridging that gap. Whether you're an ML engineer or an enterprise leader, this book offers a practical game plan for taking agentic systems from prototype to production. Written by practitioners with deep experience in AgentOps, data engineering, and GenAI infrastructure, this guide takes you through real-world workflows from data prep and deployment to orchestration and integration. With concrete examples, field-tested frameworks, and honest insights, you'll learn how to build agentic systems that deliver measurable business value. > Bridge the production gap that stalls 90% of vertical AI initiatives using systematic deployment frameworks > Navigate AgentOps complexities through practical guidance on orchestration, evaluation, and responsible AI practices > Build robust multimodal systems for text, images, and video using proven agent architectures > Optimize for scale with strategies for cost management, performance tuning, and production monitoring View details
    Preview abstract Generative AI is reshaping software development, yet its psychological impact remains under-researched. During May and August 2025 we conducted reflexive thematic analysis of interviews with 12 senior engineers (≥5 years experience) recruited from Western technology hubs to explore shifts in professional identity. We identify a central transition from "coder to conductor," where AI acts as a cognitive partner. Key findings include: (1) a re-architecting of focus from implementation to strategy; (2) a shift in productivity metrics from output to impact; and (3) a dual-impact on agency, where AI empowers autonomy but threatens competence through de-skilling anxieties. These findings suggest that as implementation becomes commoditised, organisational training and career progression must prioritise architectural mastery and metacognitive oversight to ensure sustained developer motivation and system integrity. View details
    Preview abstract Large language models (LLMs) are trained on web-scale corpora that exhibit steep power-law distributions, in which the distribution of knowledge is highly long-tailed, with most appearing infrequently. While scaling has improved average-case performance, persistent failures on low-frequency, domain-specific, cultural, and temporal knowledge remain poorly characterized. This paper develops a structured taxonomy and analysis of long-tail knowledge in large language models, synthesizing prior work across technical and sociotechnical perspectives. We organize the literature along four complementary axes: how long-tail knowledge is defined, the mechanisms by which it is lost or distorted during training and inference, the technical interventions proposed to mitigate these failures, and the implications of these failures for fairness, accountability, transparency, and user trust. We further examine how existing evaluation practices obscure tail behavior and complicate accountability for rare but consequential failures. The paper concludes by identifying open challenges related to privacy, sustainability, and governance that constrain long-tail knowledge representation. Taken together, this paper provides a unifying conceptual framework for understanding how long-tail knowledge is defined, lost, evaluated, and manifested in deployed language model systems. View details
    Towards AI as a Collaborative Partner: A Taxonomy of AI Agent Behavior in Software Engineering
    Sherry Y. Shi
    Proceedings of the 3rd ACM International Conference on AI-Powered Software (AIware '26), ACM, Montreal, QC, Canada (2026) (to appear)
    Preview abstract The ongoing transition of Large Language Models (LLMs) in software engineering from one-shot code generators into agentic partners requires a shift in how we define and measure success. While models are becoming more capable, the industry lacks a clear understanding of the behavioral norms that make an interactive software engineering (SWE) agent effective in collaborative software development in the enterprise. This work addresses this gap by presenting a taxonomy of desirable SWE agent behaviors, synthesized from 91 sets of developer-defined rules for SWE agents and validated through interviewing 15 experienced professional developers. In this taxonomy, we identify four core expectations: Adhere to Standards and Processes, Ensure Code Quality and Reliability, Solve Problems Effectively, and Collaborate with the Developer. These findings offer a concrete vocabulary for aligning SWE agent behavior with developer preferences, enabling researchers and practitioners to move beyond correctness-only benchmarks and start designing evaluations that reflect the socio-technical nature of professional software development in enterprises. View details
    Preview abstract This study examines the psychological and ethical implications of generative-AI chatbot use among youth, introducing the CTRL framework (Cognitive Trust, Reliance, and Learning Diminution) to explain how repeated use fosters cognitive offloading and reduced verification behavior. Survey data from 420 participants analyzed through factor analysis and structural equation modeling reveal that higher trust predicts greater reliance and diminished critical evaluation, alongside elevated concerns around privacy and academic integrity. Findings highlight the need for AI literacy and responsible design to mitigate unintended cognitive impacts. View details
    Preview abstract The rapid expansion of the Internet of Things (IoT) and smart home ecosystems has led to a fragmented landscape of user data management across consumer electronics (CE) such as Smart TVs, gaming consoles, and set-top boxes. Current onboarding processes on these devices are characterized by high friction due to manual data entry and opaque data-sharing practices. This paper introduces the User Data Sharing System (UDSS), a platform-agnostic framework designed to facilitate secure, privacy-first PII (Personally Identifiable Information) exchange between device platforms and third-party applications. Our system implements a Contextual Scope Enforcement (CSE) mechanism that programmatically restricts data exposure based on user intent—specifically distinguishing between Sign-In and Sign-Up workflows. Unlike cloud-anchored identity standards such as FIDO2/WebAuthn, UDSS is designed for shared, device-centric CE environments where persistent user-to-device bind-ing cannot be assumed. We further propose a tiered access model that balances developer needs with regulatory compliance (GDPR/CCPA). A proof-of-concept implementation on a reference ARMv8 Linux-based middleware demonstrates that UDSS reduces user onboarding latency by 65% and measurably reduces PII over-exposure risk through protocol-enforced data minimization. This framework provides a standardized approach to identity management in the heterogeneous CE market. View details
    Preview abstract Contrail microphysical simulations and climate simulations have indicated that contrail cirrus cause a substantial fraction of aviation’s climate impact. While the approximations and parameter selections in these simulations have been well-validated over the past two decades, the heat trapping of contrails has not been observed using satellite data beyond a few hours. This is because contrails lose their linear shape after a few hours, making them difficult to distinguish from natural cirrus clouds. Here we provide satellite-driven analysis of long-lived heat trapping by contrails over North and South America. We aggregate a dataset of GOES-16 estimated outgoing longwave radiation and advected trace density of flight paths, and apply causal inference to discern the effect of contrails while controlling for radiative and cloud confounders. As a means of validation, we also generate synthetic datasets with known ground truth, and confirm that applying the causal inference method is able to recover the synthetic ground truth. Since this method yields an estimate which has some differences from both “instantaneous radiative forcing” (iRF) and “effective radiative forcing” (ERF) estimates which have been reported in the literature so far, we introduce the new term “observational radiative forcing, 12 hours” (oRF12). Our analysis estimates the longwave oRF12 from contrails over the Americas averaged 47.9 gigajoules per flight kilometer (95% CI: 31 to 52 GJ/km) during April 2019 to April 2020. View details
    Preview abstract The rapid adoption of agentic systems powered by large language models (LLMs) introduces significant security challenges distinct from plain conversational models, particularly concerning prompt injection and tool misuse due to their dynamic personas and real- world tool interactions. This paper investigates the effectiveness of hardened security prompting in a task-oriented multi-agent framework, using a coding assistant as a representative case study. We com- pare a baseline ”unhardened” agent against a ”hard- ened” version equipped with explicit security guide- lines applied across all sub-agents. Our evaluation across 150+ single-turn and 32 multi-turn attack sce- narios demonstrates that prompt hardening dramat- ically improves resilience. With a simple, approxi- mately 500-token security hardener, single-turn fail- ure rates dropped from 19.48% to 2.60%, while multi- turn failure rates decreased from 75.00% to 46.88%. Furthermore, we show that successfully bypassing the hardened agent requires significantly more adversar- ial effort and a greater number of chat turns. How- ever, the analysis also reveals a critical shift in vul- nerability taxonomy: as direct attacks fail, adver- saries exploit the agent’s core functionality via ”Func- tional Wrappers” (Intent Obfuscation), highlighting a residual risk that necessitates a shift in the defen- sive paradigm from static filters to dynamic runtime state and intent analysis. View details
    Phoenix: Rowhammer Attacks on DDR5 with Self-Correcting Synchronization
    Michele Marazzi
    Kaveh Razavi
    Salman Qazi
    Diego Meyer
    Patrick Jattke
    IEEE Security & Privacy (S&P) (2026)
    SAC133 - SSAC Comments on Proposed Root KSK Algorithm Rollover
    Wes Hardaker
    Internet Corporation for Assigned Names and Numbers (ICANN), ICANN Security and Stability Advisory Committee (SSAC) Reports and Advisories (2026), pp. 9
    Preview abstract The SSAC supports the transition from RSA with SHA-256 (Algorithm 8) to ECDSA P-256 with SHA-256 (Algorithm 13) as the cryptographic algorithm for the RootKSK. The root zone has relied on RSA-based algorithms since DNSSEC signing began in 2010. The algorithm did not change during the first KSK rollover in 2018 or during the second rollover currently underway and scheduled to complete in October 2026. Establishing a clear and predictable process for algorithm transitions is essential to the long-term security of the root zone, and the SSAC observes that the proposal addresses the Recommendation 23 of the SSR2 Review accordingly. The SSAC notes that the proposal builds upon the Root Zone DNSSEC Algorithm Rollover Study published by ICANN in May 2024, which assessed resolver and authoritative server support for alternative algorithms, analyzed rollover methodologies, and evaluated operational risks. The SSAC finds that the proposal implements the study’s recommendations. The SSAC also notes that this proposal is consistent with the SSAC’s prior work on DNSSEC key rollover, including SAC063, SAC073, SAC102, and SAC108. The SSAC encourages ICANN to proceed with this rollover. Specific comments on the proposal’s methodology, timeline, and operational readiness follow View details
    DeduBB: Binary Code Size Reduction via Post-Link Basic Block De-duplication
    Chaitanya Mamatha Ananda
    Rajiv Gupta
    Mahbod Afarin
    Han Shen
    LCTES (Languages, Compilers, Tools and Theory of Embedded Systems) (2026) (to appear)
    Preview abstract Binary sizes of newer versions of software applications tend to be larger, primarily due to feature bloat. This poses various challenges, particularly for mobile applications. It affects upgrade rates directly impacting revenues, increases maintenance costs of supporting multiple versions, and prevents some users from getting critical security fixes. Code bloat also poses a problem for large warehouse-scale applications. Such applications experience performance degradation when their code size exceeds what smaller and more efficient code models can handle. In this paper, we introduce a post-link optimization tech nique called DeduBB, which deduplicates basic blocks of an application across procedure boundaries. While prior tech- niques used function outlining to de-duplicate redundant code sequences, it missed out on many opportunities as it cannot handle code that manipulates the program stack. In addition, previous techniques were either limited to the scope of a module or lacked scalable implementations required to handle large warehouse-scale applications. Our technique, DeduBB, handles all types of code duplication as we use a novel save-and-jump code pattern to execute de-duplicated code blocks. In addition, DeduBB has been designed to work on scalable post-link optimizers and can even be applied to large warehouse-scale datacenter applications. Finally, DeduBB is profile-guided and can be applied selectively to infrequently executed cold basic blocks to not affect application performance. In fact, in several cases, the performance of the smaller application binary improves due to reductions in its hot working set size. We have implemented our technique on the state-of-the-art post link optimizers, BOLT and Propeller. Experiments show that we can significantly reduce the code size of several benchmarks by 1.55% to 18.63%, on both Arm and x86 platforms, and on binaries that have already been heavily optimized for size using existing code size reduction features. Furthermore, aided by profiles, our technique can retain more than 80% of the maximal code size savings without affecting performance. View details
    ×