Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 10069 publications
    Preview abstract Despite the longstanding adage "an image is worth a thousand words," creating accurate and hyper-detailed image descriptions for training Vision-Language models remains challenging. Current datasets typically have web-scraped descriptions that are short, low-granularity, and often contain details unrelated to the visual content. As a result, models trained on such data generate descriptions replete with missing information, visual inconsistencies, and hallucinations. To address these issues, we introduce ImageInWords (IIW), a carefully designed human-in-the-loop annotation framework for curating hyper-detailed image descriptions and a new dataset resulting from this process. We validate the framework through evaluations focused on the quality of the dataset and its utility for fine-tuning with considerations for readability, comprehensiveness, specificity, hallucinations, and human-likeness. Our dataset significantly improves across these dimensions compared to recently released datasets (+66%) and GPT-4V outputs (+48%). Furthermore, models fine-tuned with IIW data excel by +31% against prior work along the same human evaluation dimensions. Given our fine-tuned models, we also evaluate text-to-image generation and vision-language reasoning. Our model's descriptions can generate images closest to the original, as judged by both automated and human metrics. We also find our model produces more compositionally rich descriptions, outperforming the best baseline by up to 6% on ARO, SVO-Probes, and Winoground datasets. View details
    HyperAttention: Large-scale Attention in Linear Time
    Amin Karbasi
    Amir Zandieh
    Insu Han
    David Woodruff
    HyperAttention: Long-context Attention in Near-Linear Time (2024) (to appear)
    Preview abstract In this paper, we introduce a novel approximate attention mechanism dubbed ``HyperAttention``. Despite the rapidly increasing size and complexity of contexts used with Large Language Models (LLM), there is still a dire lack of computationally efficient attention mechanisms scaling better than the naive quadratic time. HyperAttention addresses this gap: it delivers provably linear time complexity with respect to the size of the context, while only incurring a negligible loss in downstream quality. Distinctively, it integrates the principles of Locality Sensitive Hashing (LSH), for efficient detection of heavy elements, along with uniform column sampling, allowing for a good approximation both of the heavy and light components of the attention matrix. HyperAttention provably approximates the attention layer in \textit{linear time}, making it the first practical linear time approximate attention mechanism. Crucially, HyperAttention has a highly-modular design, allowing seamless integration of other rapid low-level implementations, most notably FlashAttention. Empirical evaluations indicate that HyperAttention surpasses the existing methods, achieving orders of magnitude speed-up when compared to prevalent state-of-the-art solutions such as Flash Attention. This breakthrough presents significant implications for enabling the scalability of LLMs to significantly larger contexts. View details
    Shadow Hamiltonian Simulation
    Rolando Somma
    Robbie King
    Thomas O'Brien
    arXiv:2407.21775 (2024)
    Preview abstract We present shadow Hamiltonian simulation, a framework for simulating quantum dynamics using a compressed quantum state that we call the “shadow state”. The amplitudes of this shadow state are proportional to the expectations of a set of operators of interest. The shadow state evolves according to its own Schrodinger equation, and under broad conditions can be simulated on a quantum computer. We analyze a number of applications of this framework to quantum simulation problems. This includes simulating the dynamics of exponentially large systems of free fermions, or exponentially large systems of free bosons, the latter example recovering a recent algorithm for simulating exponentially many classical harmonic oscillators. Shadow Hamiltonian simulation can be extended to simulate expectations of more complex operators such as two-time correlators or Green’s functions, and to study the evolution of operators themselves in the Heisenberg picture View details
    Preview abstract Facilitated by large language models (LLMs), personalized text generation has become a rapidly growing research direction. Most existing studies focus on designing specialized models for a particular domain, or they require fine-tuning the LLMs to generate personalized text. We consider a typical scenario in which the large language model, which generates personalized output, is frozen and can only be accessed through APIs. Under this constraint, all one can do is to improve the input text (i.e., text prompts) sent to the LLM, a procedure that is usually done manually. In this paper, we propose a novel method to automatically revise prompts for personalized text generation. The proposed method takes the initial prompts generated by a state-of-the-art, multistage framework for personalized generation and rewrites a few critical components that summarize and synthesize the personal context. The prompt rewriter employs a training paradigm that chains together supervised learning (SL) and reinforcement learning (RL), where SL reduces the search space of RL and RL facilitates end-to-end training of the rewriter. Using datasets from three representative domains, we demonstrate that the rewritten prompts outperform both the original prompts and the prompts optimized via supervised learning or reinforcement learning alone. In-depth analysis of the rewritten prompts shows that they are not only human readable, but also able to guide manual revision of prompts when there is limited resource to employ reinforcement learning to train the prompt rewriter, or when it is costly to deploy an automatic prompt rewriter for inference. View details
    Towards Generalist Biomedical AI
    Danny Driess
    Andrew Carroll
    Chuck Lau
    Ryutaro Tanno
    Ira Ktena
    Anil Palepu
    Basil Mustafa
    Aakanksha Chowdhery
    Simon Kornblith
    Philip Mansfield
    Sushant Prakash
    Renee Wong
    Sunny Virmani
    Sara Mahdavi
    Bradley Green
    Ewa Dominowska
    Joelle Barral
    Karan Singhal
    Pete Florence
    NEJM AI (2024)
    Preview abstract BACKGROUND: Medicine is inherently multimodal, requiring the simultaneous interpretation and integration of insights between many data modalities spanning text, imaging, genomics, and more. Generalist biomedical artificial intelligence systems that flexibly encode, integrate, and interpret these data might better enable impactful applications ranging from scientific discovery to care delivery. METHODS: To catalyze development of these models, we curated MultiMedBench, a new multimodal biomedical benchmark. MultiMedBench encompasses 14 diverse tasks, such as medical question answering, mammography and dermatology image interpretation, radiology report generation and summarization, and genomic variant calling. We then introduced Med-PaLM Multimodal (Med-PaLM M), our proof of concept for a generalist biomedical AI system that flexibly encodes and interprets biomedical data including clinical language, imaging, and genomics with the same set of model weights. To further probe the capabilities and limitations of Med-PaLM M, we conducted a radiologist evaluation of model-generated (and human) chest x-ray reports. RESULTS: We observed encouraging performance across model scales. Med-PaLM M reached performance competitive with or exceeding the state of the art on all MultiMedBench tasks, often surpassing specialist models by a wide margin. In a side-by-side ranking on 246 retrospective chest x-rays, clinicians expressed a pairwise preference for Med-PaLM Multimodal reports over those produced by radiologists in up to 40.50% of cases, suggesting potential clinical utility. CONCLUSIONS: Although considerable work is needed to validate these models in real-world cases and understand if cross-modality generalization is possible, our results represent a milestone toward the development of generalist biomedical artificial intelligence systems. View details
    Open Se Cura: First Silicon Results of an Auditable and Transparent Hardware Root of Trust System using Open EDA in 16-nm
    Guanchen Tao
    Ming-Hung Chen
    Bangfei Pan
    Kai Yick
    Dennis Sylvester
    Mehdi Saligane
    IEEE Solid-State Circuits Magazine, 16(2024), pp. 58-66
    Preview abstract Hardware root of trust (HRoT) is essential for IoT devices as it provides critical user data protection. However, each novel use case significantly lengthens the development time for an HRoT system. Furthermore, most HRoT solutions are proprietary, and users lack permission to inspect and audit such systems [1] , [2] . This article introduces Open Se Cura, which is an open source framework designed to expedite the implementation of secure and transparent HRoT systems. The platform grants designers the flexibility to choose their preferred electronic design automation (EDA) tools. They can opt for proprietary EDA solutions or select from open source alternatives, including OpenROAD [3] , [4] , using the OpenFASOC framework [5] , [6] . Additionally, the platform supports the use of open source process design kits (PDKs) to present a transparent and auditable approach to hardware–software co-design. This approach enables fast and trustworthy HRoT system implementation and is openly available to reproduce its results and security efficacy [7] . The extended version of the Open Se Cura reference design is showcased through FPGA emulation and its 22-nm ASIC implementation. We finally present the first measurement results of a 16-nm silicon implementation of selected components from OpenTitan, the security RoT hardware building block of Open Se Cura. This work was integrated using OpenFASOC’s modular flow, which allows one to call for open tools, such as OpenROAD, for physical design and closed tools for the missing open source EDA in 16 nm. View details
    Preview abstract Effective model calibration is a critical and indispensable component in developing Media Mix Models (MMMs). One advantage of Bayesian-based MMMs lies in their capacity to accommodate the information from experiment results and the modelers' domain knowledge about the ad effectiveness by setting priors for the model parameters. However, it remains ambiguous about how and which Bayesian priors should be tuned for calibration purpose. In this paper, we propose a new calibration method through model reparameterization. The reparameterized model includes Return on Ads Spend (ROAS) as a model parameter, enabling straightforward adjustment of its prior distribution to align with either experiment results or the modeler's prior knowledge. The proposed method also helps address several key challenges regarding combining MMMs and incrementality experiments. We use simulations to demonstrate that our approach can significantly reduce the bias and uncertainty in the resultant posterior ROAS estimates. View details
    Preview abstract AI-powered software development tooling is changing the way that developers interact with tools and write code. However, the ability for AI to truly transform software development depends on developers' level of trust in the tools. In this work, we take a mixed methods approach to measuring the factors that influence developers' trust in AI-powered code completion. We identified that familiarity with AI suggestions, quality of the suggestion, and level of expertise with the language all increased acceptance rate of AI-powered suggestions. While suggestion length and presence in a test file decreased acceptance rates. Based on these findings we propose recommendations for the design of AI-powered development tools to improve trust. View details
    Preview abstract We study $L_2$ mean estimation under central differential privacy and communication constraints, and address two key challenges: firstly, existing mean estimation schemes that simultaneously handle both constraints are usually optimized for $L_\infty$ geometry and rely on random rotation or Kashin's representation to adapt to $L_2$ geometry, resulting in suboptimal leading constants in mean square errors (MSEs); secondly, schemes achieving order-optimal communication-privacy trade-offs do not extend seamlessly to streaming differential privacy (DP) settings (e.g., tree aggregation or matrix factorization), rendering them incompatible with DP-FTRL type optimizers. In this work, we tackle these issues by introducing a novel privacy accounting method for the sparsified Gaussian mechanism that incorporates the randomness inherent in sparsification into the DP noise. Unlike previous approaches, our accounting algorithm directly operates in $L_2$ geometry, yielding MSEs that fast converge to those of the uncompressed Gaussian mechanism. Additionally, we extend the sparsification scheme to the matrix factorization framework under streaming DP and provide a precise accountant tailored for DP-FTRL type optimizers. Empirically, our method demonstrates at least a 100x improvement of compression for DP-SGD across various FL tasks. View details
    Found in the middle: Calibrating Positional Attention Bias Improves Long Context Utilization
    Cheng-Yu Hsieh
    Yung-Sung Chuang
    Chun-Liang Li
    Abhishek Kumar
    James Glass
    Alexander Ratner
    Ranjay Krishna
    Preview abstract Large language models (LLMs), even when specifically trained to process long input contexts, struggle to capture relevant information located in the middle of their input. This phenomenon has been known as the lost-in-the-middle problem. In this work, we make three contributions. First, we set out to understand the factors that cause this phenomenon. In doing so, we establish a connection between lost-in-the-middle to LLMs' intrinsic attention bias: LLMs exhibit a U-shaped attention bias where the tokens at the beginning and at the end of its input receive higher attention, regardless of their relevance. Second, we mitigate this positional bias through a calibration mechanism, found-in-the-middle, that allows the model to attend to contexts faithfully according to their relevance, even though when they are in the middle. Third, we show found-in-the-middle not only achieves better performance in locating relevant information within a long context, but also eventually leads to improved retrieval-augmented generation (RAG) performance across various tasks, outperforming existing methods by up to 15 percentage points. These findings open up future directions in understanding LLM attention bias and its potential consequences. View details
    Knowledge Distillation with Perturbed Loss: From a Vanilla Teacher to a Proxy Teacher
    Rongzhi Zhang
    Chao Zhang
    Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2024), ACM, pp. 4278 - 4289
    Preview abstract Knowledge distillation is a popular technique to transfer knowledge from a large teacher model to a small student model. Typically, the student learns to imitate the teacher by minimizing the KL divergence of its output distribution with the teacher's output distribution. In this work, we argue that such a learning objective is sub-optimal because there exists a discrepancy between the teacher's output distribution and the ground truth label distribution. Therefore, forcing the student to blindly imitate the unreliable teacher output distribution leads to inferior performance. To this end, we propose a novel knowledge distillation objective PTLoss by first representing the vanilla KL-based distillation loss function via a Maclaurin series and then perturbing the leading-order terms in this series. This perturbed loss implicitly transforms the original teacher into a proxy teacher with a distribution closer to the ground truth distribution. We establish the theoretical connection between this "distribution closeness'' and the student model generalizability, which enables us to select the PTLoss's perturbation coefficients in a principled way. Extensive experiments on six public benchmark datasets demonstrate the effectiveness of PTLoss with teachers of different scales. View details
    Prompt Cache: Modular Attention Reuse for Low Latency Inference
    Anurag Khandelwal
    Guojun Chen
    In Gim
    Lin Zhong
    Seung-seob Lee
    Νikhil Sarda
    MLSys (2024)
    Preview abstract We present Prompt Cache, an approach for accelerating inference for large language models (LLM) by reusing attention states across different LLM prompts. Many input prompts have overlapping text segments, such as system messages, prompt templates, and documents provided for context. Our key insight is that by precomputing and storing the attention states of these frequently occurring text segments on the inference server, we can efficiently reuse them when these segments appear in user prompts. Prompt Cache employs a schema to explicitly define such reusable text segments, called prompt modules. The schema ensures positional accuracy during attention state reuse and provides users with an interface to access cached states in their prompt. Using a prototype implementation, we evaluate Prompt Cache across several LLMs. We show that Prompt Cache significantly reduce latency in time-to-first-token, especially for longer prompts such as document-based question answering and recommendations. The improvements range from 8× for GPU-based inference to 60× for CPU-based inference, all while maintaining output accuracy and without the need for model parameter modifications. View details
    Preview abstract Task-oriented queries (e.g., one-shot queries to play videos, order food, or call a taxi) are crucial for assessing the quality of virtual assistants, chatbots, and other large language model (LLM)-based services. However, a standard benchmark for task-oriented queries is not yet available, as existing benchmarks in the relevant NLP (Natural Language Processing) fields have primarily focused on task-oriented dialogues. Thus, we present a new methodology for efficiently generating the Task-oriented Queries Benchmark (ToQB) using existing task-oriented dialogue datasets and an LLM service. Our methodology involves formulating the underlying NLP task to summarize the original intent of a speaker in each dialogue, detailing the key steps to perform the devised NLP task using an LLM service, and outlining a framework for automating a major part of the benchmark generation process. Through a case study encompassing three domains (i.e., two single-task domains and one multi-task domain), we demonstrate how to customize the LLM prompts (e.g., omitting system utterances or speaker labels) for those three domains and characterize the generated task-oriented queries. The generated ToQB dataset is made available to the public.We further discuss new domains that can be added to ToQB by community contributors and its practical applications. View details
    Preview abstract Given a training data-set $\mathcal{S}$, and a reference data-set $\mathcal{T}$, we design a simple and efficient algorithm to reweigh the loss function such that the limiting distribution of the neural network weights that result from training on $\mathcal{S}$ approaches the limiting distribution that would have resulted by training on $\mathcal{T}$. Such reweighing can be used to correct for Train-Test distribution shift, when we don't have access to the labels of $\mathcal{T}$. It can also be used to perform (soft) multi-criteria optimization on neural nets, when we have access to the labels of $\mathcal{T}$, but $\mathcal{S}$ and $\mathcal{T}$ have few common points. As a motivating application, we train a graph neural net to recognize small molecule binders to MNK2 (a MAP Kinase, responsible for cell signaling) which are non-binders to MNK1 (a very similar protein), even in the absence of training data common to both data-sets. We are able to tune the reweighing parameters so that overall change in holdout loss is negligible, but the selectivity, i.e., the fraction of top 100 MNK2 binders that are MNK1 non-binders, increases from 54\% to 95\%, as a result of our reweighing. We expect the algorithm to be applicable in other settings as well, since we prove that when the metric entropy of the input data-sets is bounded, our random sampling based greedy algorithm outputs a close to optimal reweighing, i.e., the two invariant distributions of network weights will be provably close in total variation distance. View details
    Preview abstract This paper reflects on work at Google over the past decade to address common types of software safety and security defects. Our experience has shown that software safety is an emergent property of the software and tooling ecosystem it is developed in and the production environment into which it is deployed. Thus, to effectively prevent common weaknesses at scale, we need to shift-left the responsibility for ensuring safety and security invariants to the end-to-end developer ecosystem, that is, programming languages, software libraries, application frameworks, build and deployment tooling, the production platform and its configuration surfaces, and so forth. Doing so is practical and cost effective when developer ecosystems are designed with application archetypes in mind, such as web or mobile apps: The design of the developer ecosystem can address threat model aspects that apply commonly to all applications of the respective archetype, and investments to ensure safety invariants at the ecosystem level amortize across many applications. Applying secure-by-design principles to developer ecosystems at Google has achieved drastic reduction and in some cases near-zero residual rates of common classes of defects, across hundreds of applications being developed by thousands of developers. View details