Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10473 publications
Preview abstract
Large Language Models (LLMs) have demonstrated impressive capabilities across a range of natural language processing tasks. In particular, improvements in reasoning abilities and the expansion of context windows have opened new avenues for leveraging these powerful models.
NL2SQL is challenging in that the natural language question is inherently ambiguous, while the SQL generation requires a precise understanding of complex data schema and semantics. One approach to this semantic ambiguous problem is to provide more and sufficient contextual information.
In this work, we explore the performance and the latency trade-offs of the extended context window (a.k.a., long context) offered by Google's state-of-the-art LLM (\textit{gemini-1.5-pro}).
We study the impact of various contextual information, including column example values, question and SQL query pairs, user-provided hints, SQL documentation, and schema. To the best of our knowledge, this is the first work to study how the extended context window and extra contextual information can help NL2SQL generation with respect to both accuracy and latency cost.
We show that long context LLMs are robust and do not get lost in the extended contextual information. Additionally, our long-context NL2SQL pipeline based on Google's \textit{gemini-pro-1.5} achieve a strong performance with 67.41\% on BIRD benchmark (dev) without finetuning and expensive self-consistency based techniques.
View details
Oculomics: Current Concepts and Evidence
Zhuoting Zhu
Yueye Wang
Ziyi Qi
Wenyi Hu
Xiayin Zhang
Siegfried Wagner
Yujie Wang
An Ran Ran
Joshua Ong
Ethan Waisberg
Mouayad Masalkhi
Alex Suh
Yih Chung Tham
Carol Y. Cheung
Xiaohong Yang
Honghua Yu
Zongyuan Ge
Wei Wang
Bin Sheng
Andrew G. Lee
Alastair Denniston
Peter van Wijngaarden
Pearse Keane
Ching-Yu Cheng
Mingguang He
Tien Yin Wong
Progress in Retinal and Eye Research (2025)
Preview abstract
The eye provides novel insights into general health, as well as pathogenesis and development of systemic diseases. In the past decade, growing evidence has demonstrated that the eye's structure and function mirror multiple systemic health conditions, especially in cardiovascular diseases, neurodegenerative disorders, and kidney impairments. This has given rise to the field of oculomics- the application of ophthalmic biomarkers to understand mechanisms, detect and predict disease. The development of this field has been accelerated by three major advances: 1) the availability and widespread clinical adoption of high-resolution and non-invasive ophthalmic imaging (“hardware”); 2) the availability of large studies to interrogate associations (“big data”); 3) the development of novel analytical methods, including artificial intelligence (AI) (“software”). Oculomics offers an opportunity to enhance our understanding of the interplay between the eye and the body, while supporting development of innovative diagnostic, prognostic, and therapeutic tools. These advances have been further accelerated by developments in AI, coupled with large-scale linkage datasets linking ocular imaging data with systemic health data. Oculomics also enables the detection, screening, diagnosis, and monitoring of many systemic health conditions. Furthermore, oculomics with AI allows prediction of the risk of systemic diseases, enabling risk stratification, opening up new avenues for prevention or individualized risk prediction and prevention, facilitating personalized medicine. In this review, we summarise current concepts and evidence in the field of oculomics, highlighting the progress that has been made, remaining challenges, and the opportunities for future research.
View details
ZAPBench: A Benchmark for Whole-Brain Activity Prediction in Zebrafish
Alexander Immer
Alex Bo-Yuan Chen
Mariela D. Petkova
Nirmala A. Iyer
Luuk Willem Hesselink
Aparna Dev
Gudrun Ihrke
Woohyun Park
Alyson Petruncio
Aubrey Weigel
Wyatt Korff
Florian Engert
Jeff W. Lichtman
Misha B. Ahrens
International Conference on Learning Representations (ICLR) (2025)
Preview abstract
Data-driven benchmarks have led to significant progress in key scientific modeling domains including weather and structural biology. Here, we present the Zebrafish Activity Prediction Benchmark (ZAPBench), which quantitatively measures progress on the problem of predicting cellular-resolution neural activity throughout an entire vertebrate brain. The benchmark is based on a novel dataset containing 4d light-sheet microscopy recordings of more than 70,000 neurons in a larval zebrafish brain, along with motion stabilized and voxel-level cell segmentations of these data that facilitate development of a variety of forecasting methods. Initial results from a selection of time series and volumetric video modeling approaches achieve better performance than naive baseline methods, but also show room for further improvement. The specific brain used in the activity recording is also undergoing synaptic-level anatomical mapping, which will enable future integration of detailed structural information into ZAP forecasting methods.
View details
Validation of Quantum Elliptic Curve Point Addition Circuits
(2025) (to appear)
Preview abstract
Specific quantum algorithms exist to—in theory—
break elliptic curve cryptographic protocols. Implementing these
algorithms requires designing quantum circuits that perform elliptic curve arithmetic. To accurately judge a cryptographic protocol’s resistance against future quantum computers, researchers
figure out minimal resource-count circuits for performing these
operations while still being correct. To assure the correctness of
a circuit, it is integral to restore all ancilla qubits used to their
original states. Failure to do so could result in decoherence of the
computation’s final result. Through rigorous classical simulation
and unit testing, I surfaced four inconsistencies in the state-ofthe-art quantum circuit for elliptic curve point addition where
the circuit diagram states the qubits are returned in the original
(|0⟩) state, but the intermediate values are not uncomputed. I
provide fixes to the circuit without increasing the leading-order
gate cost.
View details
I know what I don't know: improving model cascades through confidence tuning
Stephan Rabanser
Nathalie Rauschmayr
Petra Poklukar
Congchao Wang
2025
Preview abstract
Large-scale machine learning models deliver strong performance across a wide range of tasks but come with significant computational and resource constraints. To mitigate these challenges, local smaller models are often deployed alongside larger models, relying on routing and deferral mechanisms to offload complex tasks. However, existing approaches inadequately balance the capabilities of these models, often resulting in unnecessary deferrals or sub-optimal resource usage. In this work we introduce a novel loss function called Gatekeeper for calibrating smaller models in cascade setups. Our approach fine-tunes the smaller model to confidently handle tasks it can perform correctly while deferring complex tasks to the larger model. Moreover, it incorporates a mechanism for managing the trade-off between model performance and deferral accuracy, and is broadly applicable across various tasks and domains without any architectural changes. We evaluated our method on encoder-only, decoder-only, and encoder-decoder architectures. Experiments across image classification, language modeling, and vision-language tasks show that our approach substantially improves deferral performance.
View details
Preview abstract
Hashing is a fundamental operation in various computer sci-
ence applications. Despite the prevalence of specific key
formats like social security numbers, MAC addresses, plate
numbers, and URLs, hashing libraries typically treat them as
general byte sequences. This paper introduces a technique
for synthesizing specialized hash functions tailored to par-
ticular byte formats. The proposed code generation method
leverages three prevalent patterns: (i) fixed-length keys, (ii)
keys with common subsequences, and (iii) keys ranging on
predetermined sequences of bytes. The code generation pro-
cess involves two algorithms: one identifies relevant regular
expressions within key examples, and the other generates
specialized hash functions based on these expressions. This
approach, straightforward to implement, showcases improve-
ments over highly optimized hash function implementations.
Comparative analysis demonstrates that our synthetic func-
tions outperform counterparts in the C++ Standard Template
Library and the Google Abseil Library, achieving speedups
ranging from 2% to 11%, depending on the key format.
View details
Perch 2.0: The Bittern Lesson for Bioacoustics
Bart van Merriënboer
Tom Denton
arXiv (2025)
Preview abstract
Perch is a performant pre-trained model for bioacoustics. It was trained in supervised fashion, providing both off-the-shelf classification scores for thousands of vocalizing species as well as strong embeddings for transfer learning. In this new release, Perch 2.0, we expand from training exclusively on avian species to a large multi-taxa dataset. The model is trained with self-distillation using a prototype-learning classifier as well as a new source-prediction training criterion. Perch 2.0 obtains state-of-the-art performance on the BirdSet and BEANS benchmarks. It also outperforms specialized marine models on marine transfer learning tasks, despite having almost no marine training data. We present hypotheses as to why fine-grained species classification is a particularly robust pre-training task for bioacoustics.
View details
Security Assurance in the Age of Generative AI
Tom Grzelak
Kara Olive
Moni Pande
Google, Google, 1600 Amphitheatre Parkway, Mountain View, CA, 94043 (2025)
Preview abstract
Artificial Intelligence (AI) is a rapidly growing field known for experimentation and quick iteration, qualities that can pose challenges for traditional enterprise security approaches. Because AI introduces unique assets and surfaces—AI-driven applications, agents, assistants, vast training datasets, the models themselves, and supporting infrastructure—we’re continually updating our security controls, guided by Google’s Secure AI Framework (SAIF).
To address the new challenges, we’ve expanded our traditional security approaches to cover the new attack surfaces by scanning for more types of vulnerabilities, analyzing more intel, preparing to respond to new kinds of incidents, and continually testing our controls in novel ways to strengthen our security posture.
This white paper is one of a series describing our approaches to implementing Google’s SAIF. In this paper we explain how we’re applying security assurance—a cross functional effort aiming to achieve high confidence that our security features, practices, procedures, controls, and architecture accurately mediate and enforce our security policies—to AI development. Security assurance efforts help to both ensure the continued security of our AI products and address relevant policy requirements.
Just as quality assurance (QA) in manufacturing meticulously examines finished products and the processes that create them to ensure they meet quality standards, security assurance serves a complementary role to the broader security efforts within an organization. Those broader security efforts span the design, implementation, and operation of controls to create secure software products; security assurance focuses on verifying and improving those efforts. Security assurance identifies gaps, weaknesses, and areas where controls may not be operating as intended, to drive continuous improvement across all security domains. It’s two-party review in action—security assurance helps build confidence that the software was not just built securely, but continues to run securely.
Since AI systems—those that use AI models for reasoning—present a combination of well understood and novel risks, AI technologies require a combination of both common and novel controls. No matter how strong these controls are, a security assurance program is essential to ensure they are working as intended and that they are continually updated and improved.
The paper opens with an overview of security assurance functions, covering several teams and capabilities that work together to ensure security controls are working across any software development lifecycle, including the AI development lifecycle. In particular, we focus on four functions—Red Teaming, Vulnerability Management, Detection & Response, and Threat Intelligence, and how those work together to address issues through Remediation.
We then describe the features specific to AI that affect assurance functions and give examples of how we’re adapting our approaches to account for AI-specific technologies and risks. We also include guidance for organizations considering creating their own AI assurance programs, including best practices for assuring training data, models, the AI software supply chain, and product integrations.
We intend this paper to be useful for a broad technical audience, including both assurance specialists who are new to AI technologies, and AI developers who are new to assurance practices.
View details
Preview abstract
We revisit the fundamental question of formally defining what constitutes a reconstruction attack. While often clear from the context, our exploration reveals that a precise definition is much more nuanced than it appears, to the extent that a single all-encompassing definition may not exist. Thus, we employ a different strategy and aim to "sandwich" the concept of reconstruction attacks by addressing two complementing questions: (i) What conditions guarantee that a given system is protected against such attacks? (ii) Under what circumstances does a given attack clearly indicate that a system is not protected? More specifically,
* We introduce a new definitional paradigm -- Narcissus Resiliency -- to formulate a security definition for protection against reconstruction attacks. This paradigm has a self-referential nature that enables it to circumvent shortcomings of previously studied notions of security. Furthermore, as a side-effect, we demonstrate that Narcissus resiliency captures as special cases multiple well-studied concepts including differential privacy and other security notions of one-way functions and encryption schemes.
* We formulate a link between reconstruction attacks and Kolmogorov complexity. This allows us to put forward a criterion for evaluating when such attacks are convincingly successful.
View details
Differentiable Approximations for Distance Queries
David M. Mount
Proceedings of the 2025 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA)
Preview abstract
The widespread use of gradient-based optimization has motivated the adaptation of various classical algorithms into differentiable solvers compatible with learning pipelines. In this paper, we investigate the enhancement of traditional geometric query problems such that the result consists of both the geometric function as well as its gradient. Specifically, we study the fundamental problem of distance queries against a set of points P in R^d, which also underlies various similarity measures for learning algorithms.
The main result of this paper is a multiplicative (1+epsilon)-approximation of the Euclidean distance to P which is differentiable at all points in R^d \ P with asymptotically optimal bounds on the norms of its gradient and Hessian, from a data structure with storage and query time matching state-of-the-art results for approximate nearest-neighbor searching. The approximation is realized as a regularized distance through a partition-of-unity framework, which efficiently blends multiple local approximations, over a suitably defined covering of space, into a smooth global approximation. In order to obtain the local distance approximations in a manner that facilitates blending, we develop a new approximate Voronoi diagram based on a simple point-location data structure, simplifying away both the lifting transformation and ray shooting.
View details
The Case for Leveraging Transport Signals to Improve Internet Speed Test Efficiency
Cristina Leon
Computer Communication Review (2025) (to appear)
Preview abstract
Internet speed tests are an important tool to enable consumers and regulators to monitor the quality of Internet access. However, increased Internet speeds to the home and an increased demand for speed testing pose scaling challenges to providers of speed tests, who must maintain costly infrastructure to keep up with this demand. In recent years, this has led the popular NDT speed test to limit data transfer to a total of 250MB, which comes at the cost of accuracy for high bandwidth speed test clients.
In this paper, we observe that the NDT speed test server’s congestion control algorithm (BBRv1) is also trying to estimate the capacity of the connection. We leverage this observation and signals from BBR to improve the accuracy and efficiency of speed tests. We first show how leveraging signals from BBR can more than double the accuracy of a 10MB test–from 17% to 43%–for clients with speeds over 400Mbps.
We then show how using BBR signals to adaptively end the speed test reduces data transfer by 36% and increased accuracy by 13% for high bandwidth clients, relative to a 100MB fixed length test. Even accounting for clients that never observe enough samples to utilize the BBR signal, this adaptive approach still uses 25% less data than a fixed 100MB test with 37-44% higher accuracy.
View details
DORA Impact of Generative AI in Software Development
Derek DeBellis
Daniella Villalba
Nathen Harvey
DORA, Google (2025)
Preview abstract
Generative AI is transforming how software is built, offering unprecedented opportunities and raising new challenges. Based on extensive research and developer interviews, this DORA report provides a nuanced understanding of AI's impact on individuals, teams, and organizations.
View details
Silent Data Corruption by 10× Test Escapes Threatens Reliable Computing
Subhasish Mitra
Rama Govindaraju
Eric Liu
IEEE Design & Test (2025) (to appear)
Preview abstract
Too many defective compute chips are escaping today’s manufacturing tests – at least an order of magnitude more than industrial targets across all compute chip types in data centers. Silent data corruptions (SDCs) caused by test escapes, when left unaddressed, pose a major threat to reliable computing. We present a three-pronged approach outlining future directions for overcoming test escapes: (a) Quick diagnosis of defective chips directly from system-level incorrect behaviors. Such diagnosis is critical for gaining insights into why so many defective chips escape existing manufacturing testing. (b) In-field detection of defective chips. (c) New test experiments to understand the effectiveness of new techniques for detecting defective chips. These experiments must overcome the drawbacks and pitfalls of previous industrial test experiments and case studies.
View details
A Recipe for Improving Remote Sensing Zero Shot Generalization
Aviad Barzilai
Yotam Gigi
Vered Silverman
Yehonathan Refael
Bolous Jaber
Amr Helmy
3rd ML4RS Workshop at ICLR 2025
Preview abstract
Foundation models have had a significant impact across various AI applications, enabling applications for use cases that were previously impossible. Visual language models (VLMs), in particular, have outperformed other techniques in many tasks. In remote sensing (RS), foundation models have shown improvements across various applications. However, unlike other fields, the use of VLMs with large-scale remote sensing image-text datasets remains limited.
In this work, we first introduce two novel image-caption datasets for training of remote sensing foundation models. The first dataset pairs aerial and satellite imagery, aligned with Google-Maps data, with high-quality captions generated using Gemini. The second utilizes public web images and their corresponding alt-text, filtered for only remote sensing domain, resulting in a highly diverse dataset.
We show that using these datasets to pre-train the Mammut [], a VLM architecture, results in state-of-the-art generalization performance in a zero-shot classification and cross-modal retrieval on well-known public benchmarks. Secondly, we leverage this newly pre-trained VLM to generate inference attention maps for a novel class query (i.e., a class unseen during training). We subsequently propose an iterative self-supervised fine-tuning approach where samples aligned with these attention maps are iteratively pseudo-labeled and utilized for model training.
View details
Preview abstract
In this article, we describe our human-centered research focused on understanding the role of collaboration and teamwork in productive software development. We describe creation of a logs-based metric to identify collaboration through observable events and a survey-based multi-item scale to assess team functioning.
View details