Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 10129 publications
    Preview abstract In this talk, we will introduce the development and evolution of speaker diarization technologies at Google in the past decade, and how they landed as impactful products such as Cloud Speech-to-Text and the Pixel Recorder app. The talk will cover four critical milestones of the speaker diarization technologies at Google: (1) leveraging deep speaker embeddings; (2) leveraging supervised clustering; (3) leveraging sequence transducers; and (4) leveraging large language models. The talk will also discuss how speaker diarization will evolve in the new era of multimodal large language models. View details
    "We Need Structured Output": Towards User-centered Constraints on Large Language Model Output
    Michael Xieyang Liu
    Frederick Liu
    Alex Fiannaca
    Terry Koo
    In Extended Abstract in ACM CHI Conference on Human Factors in Computing Systems (CHI EA '24), ACM (2024), pp. 9 (to appear)
    Preview abstract Large language models can produce creative and diverse responses. However, to integrate them into current developer workflows, it is essential to constrain their outputs to follow specific formats or standards. In this work, we surveyed 51 experienced industry professionals to understand the range of scenarios and motivations driving the need for output constraints from a user-centered perspective. We identified 134 concrete use cases for constraints at two levels: low-level, which ensures the output adhere to a structured format and an appropriate length, and high-level, which requires the output to follow semantic and stylistic guidelines without hallucination. Critically, applying output constraints could not only streamline the currently repetitive process of developing, testing, and integrating LLM prompts for developers, but also enhance the user experience of LLM-powered features and applications. We conclude with a discussion on user preferences and needs towards articulating intended constraints for LLMs, alongside an initial design for a constraint prototyping tool. View details
    Preview abstract Relational affect is the affective response (encompassing emotion, expression, feeling) that emerges from an interaction between two people. The case study presented here introduces the concept of relational affect through a human perceptual rating task. Forty-five raters watched short video clips of two people interacting and described their perceived emotion of the individuals and that of the overall interaction. Our qualitative analysis of the rater responses showed that raters used a variety of schemes to reason about emotion, including expressions, context, and perceived appraisal of the event. These reasoning schemes were notably different for perceived individual emotion and relational affect. Our findings show that the vocabulary use for relational affect is distinct from that of individual emotion and relational affect as a phenomenon deepens our understanding of social interactions and moves the field a step closer to realizing the goal of fluid interactions between people and technology. View details
    Preview abstract In this article, we study the evolution of Android permissions. We describe the rationale behind key changes in Android’s permission model and disclose two permission-related security vulnerabilities we discovered. Finally, we provide developers actionable insights to proactively address permission-related security and privacy risks during development. View details
    USER-LLM: Efficient LLM Contextualization with User Embedding
    Jiaxing Wu
    Neo Wu
    Devora Berlowitz
    Sushant Prakash
    Bradley Green
    Shawn O'Banion
    Jun Xie
    ArXiv (2024) (to appear)
    Preview abstract Large language models (LLMs) have revolutionized natural language processing. However, effectively incorporating complex and potentially noisy user interaction data remains a challenge. To address this, we propose User-LLM, a novel framework that leverages user embeddings to contextualize LLMs. These embeddings, distilled from diverse user interactions using self-supervised pretraining, capture latent user preferences and their evolution over time. We integrate these user embeddings with LLMs through cross-attention and soft-prompting, enabling LLMs to dynamically adapt to user context. Our comprehensive experiments on MovieLens, Amazon Review, and Google Local Review datasets demonstrate significant performance gains across various tasks. Notably, our approach outperforms text-prompt-based contextualization on long sequence tasks and tasks that require deep user understanding while being computationally efficient. We further incorporate Perceiver layers to streamline the integration between user encoders and LLMs, reducing computational demands. View details
    UINav: A Practical Approach to Train On-Device Automation Agents
    Wei Li
    Fu-Lin Hsu
    Will Bishop
    Folawiyo Campbell-Ajala
    2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024) - Industry Track
    Preview abstract Automation systems that can autonomously drive application user interfaces to complete user tasks are of great benefit, especially when users are situationally or permanently impaired. Prior automation systems do not produce generalizable models while AI-based automation agents work reliably only in simple, hand-crafted applications or incur high computation costs. We propose UINav, a demonstration-based approach to train automation agents that fit mobile devices, yet achieving high success rates with modest numbers of demonstrations. To reduce the demonstration overhead, UINav, uses a referee model that provides users with immediate feedback on tasks where the agent fails, and automatically augments human demonstrations to increase diversity in training data. Our evaluation shows that with only 10 demonstrations UINav, can achieve 70% accuracy, and that with enough demonstrations it can surpass 90% accuracy. View details
    Optimizing quantum gates towards the scale of logical qubits
    Alexandre Bourassa
    Andrew Dunsworth
    Will Livingston
    Vlad Sivak
    Trond Andersen
    Yaxing Zhang
    Desmond Chik
    Jimmy Chen
    Charles Neill
    Alejo Grajales Dau
    Anthony Megrant
    Alexander Korotkov
    Vadim Smelyanskiy
    Yu Chen
    Nature Communications, 15 (2024), pp. 2442
    Preview abstract A foundational assumption of quantum error correction theory is that quantum gates can be scaled to large processors without exceeding the error-threshold for fault tolerance. Two major challenges that could become fundamental roadblocks are manufacturing high-performance quantum hardware and engineering a control system that can reach its performance limits. The control challenge of scaling quantum gates from small to large processors without degrading performance often maps to non-convex, high-constraint, and time-dynamic control optimization over an exponentially expanding configuration space. Here we report on a control optimization strategy that can scalably overcome the complexity of such problems. We demonstrate it by choreographing the frequency trajectories of 68 frequency-tunable superconducting qubits to execute single- and two-qubit gates while mitigating computational errors. When combined with a comprehensive model of physical errors across our processor, the strategy suppresses physical error rates by ~3.7× compared with the case of no optimization. Furthermore, it is projected to achieve a similar performance advantage on a distance-23 surface code logical qubit with 1057 physical qubits. Our control optimization strategy solves a generic scaling challenge in a way that can be adapted to a variety of quantum operations, algorithms, and computing architectures. View details
    Conversational AI in health: Design considerations from a Wizard-of-Oz dermatology case study with users, clinicians and a medical LLM
    Brenna Li
    Amy Wang
    Patricia Strachan
    Julie Anne Seguin
    Sami Lachgar
    Karyn Schroeder
    Renee Wong
    Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems, Association for Computing Machinery, pp. 10
    Preview abstract Although skin concerns are common, access to specialist care is limited. Artificial intelligence (AI)-assisted tools to support medical decisions may provide patients with feedback on their concerns while also helping ensure the most urgent cases are routed to dermatologists. Although AI-based conversational agents have been explored recently, how they are perceived by patients and clinicians is not well understood. We conducted a Wizard-of-Oz study involving 18 participants with real skin concerns. Participants were randomly assigned to interact with either a clinician agent (portrayed by a dermatologist) or an LLM agent (supervised by a dermatologist) via synchronous multimodal chat. In both conditions, participants found the conversation to be helpful in understanding their medical situation and alleviate their concerns. Through qualitative coding of the conversation transcripts, we provide insight on the importance of empathy and effective information-seeking. We conclude with design considerations for future AI-based conversational agents in healthcare settings. View details
    Productive Coverage: Improving the Actionability of Code Coverage
    Gordon
    Luka Kalinovcic
    Marko Ivanković
    Mateusz Lewko
    Rene Just
    Yana Kulizhskaya
    ICSE-SEIP '24: Proceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice (2024) (to appear)
    Preview abstract Code coverage is an intuitive and established test adequacy measure. However, not all parts of the code base are equally important, and hence additional testing may be critical for some uncovered code, whereas it may not be worthwhile for other uncovered code. As a result, simply visualizing uncovered code is not reliably actionable. To make code coverage actionable and further improve code coverage in our codebase, we developed Productive Coverage — a novel approach to code coverage that guides developers to uncovered code that that should be tested by (unit) tests. Specifically, Productive Coverage identifies uncovered code that is similar to existing code, which in turn is tested and/or frequently executed in production. We implemented and evaluated Productive Coverage for four programming languages (C++, Java, Go, and Python). The evaluation shows: (1) The developer sentiment, measured at the point of use, is strongly positive; (2) Productive Coverage meaningfully increases code coverage above a strong baseline; (3) Productive Coverage has no negative effect on code authoring efficiency; (4) Productive Coverage modestly improves code-review effiency; (5) Productive Coverage directly improves code quality and prevents bugs from being introduced, in addition to improving test quality View details
    Preview abstract In the current digital world, large organizations (sometimes referred to as tech giants) provide service to extremely large numbers of users. The service provider is often interested in computing various data analyses over the private data of its users, which in turn have their incentives to cooperate, but do not necessarily trust the service provider. In this work, we introduce the Gulliver multi-party computation model (GMPC) to realistically capture the above scenario. The GMPC model considers a single highly powerful party, called the server or Gulliver, that is connected to n users over a star topology network (alternatively formulated as a full network, where the server can block any message). The users are significantly less powerful than the server, and, in particular, should have both computation and communication complexities that are polylogarithmic in n. Protocols in the GMPC model should be secure against malicious adversaries that may corrupt a subset of the users and/or the server. Designing protocols in the GMPC model is a delicate task, since users can only hold information about polylog(n) other users (and, in particular, can only communicate with polylog(n) other users). In addition, the server can block any message between any pair of honest parties. Thus, reaching an agreement becomes a challenging task. Nevertheless, we design generic protocols in the GMPC model, assuming that <1/8 fraction of the users may be corrupted (in addition to the server). Our main contribution is a variant of Feige's committee election protocol [FOCS 1999] that is secure in the GMPC model. Given this tool we show: * Assuming fully homomorphic encryption (FHE), any computationally efficient function with O(n polylog(n))-size output can be securely computed in the GMPC model. * Any function that can be computed by a circuit of O(polylog(n)) depth, O(n polylog(n))$ size, and bounded fan-in and fan-out can be securely computed in the GMPC model assuming vector commitment schemes (without assuming FHE). * In particular, sorting can be securely computed in the GMPC model assuming vector commitment schemes. This has important applications for the shuffle model of differential privacy, and resolves an open question of Bell et al. [CCS 2020]. View details
    Streamlining Workload Management in AI-Driven Cloud Architectures: A Comparative Algorithmic Approach
    Pravallika Mannem
    Kiran Kumar Patibandla
    International Research Journal of Engineering and Technology, 11 (2024), pp. 113-121
    Preview abstract The use of artificial intelligence (AI) in cloud architectures has significantly increased processing efficiency and scale. However, with the development of complex algorithms and big data as well as surprisingly entered into our machine learning world; workload management becomes a significant issue in AI cloud computing. Existing workload management solutions are rule-based heuristics that may result in underutilization of resources and poor performance. For that, we present an algorithmic comparative approach to easing the burden of workload management for AI-driven cloud architectures. This is in contrast to executing a batch of tasks with different algorithms and comparing performance, cost, etc. We use ML methods to determine the best algorithm for our workload, and then deploy this in a self-contained binary that can switch between algorithms at runtime on an available resource. We validated our scheme with simulations, which demonstrates the capability of superior resource use and diminished completion time in comparison to rule-based schemes. When needed, flexibility and scalability allow you easier control over workloads that are subject to change or allocation. By simplifying AI-driven cloud workload management, the elasticity of their overall approach greatly enhances efficiency and scalability for those organizations looking to run even larger and take advantage of more complex workloads faster Tweet this Share on Facebook. View details
    Preview abstract A product manager’s specific role varies from one company to the next. Still, all product managers balance many aspects of their job, including customers’ needs, a vision for new products, and the project team. So what tools and strategies are needed to create a successful career as a product manager? What are the “5 Things You Need To Create A Successful Career As A Product Manager”? Authority Magazine speaks with Aqsa Fulara, a product manager at Google to answer these questions with stories and insights from her experiences. View details
    Preview abstract Algorithms for the computation of alternative routes in road networks power many geographic navigation systems. A good set of alternative routes offers meaningful options to the user of the system and can support applications such as routing that is robust to failures (e.g., road closures, extreme traffic congestion, etc.) and routing with diverse preferences and objective functions. Algorithmic techniques for alternative route computation include the penalty method, via-node type algorithms (which deploy bidirectional search and finding plateaus), and, more recently, electrical-circuit based algorithms. In this work we focus on the practically important family of via-node type algorithms and we aim to produce high quality alternative routes for road netowrks. We study alternative route computation in the presence of a fast routing infrastructure that relies on hierarchical routing (namely, CRP). We propose new approaches that rely on deep learning methods. Our training methodology utilizes the hierarchical partition of the graph and builds models to predict which boundary road segments in the partition should be crossed by the alternative routes. We describe our methods in detail and evaluate them against the previously studied architectures, as well as against a stronger baseline that we define in this work, showing improvements in quality in the road networks of Seattle, Paris, and Bangalore. View details
    Concordance of randomised controlled trials for artificial intelligence interventions with the CONSORT-AI reporting guidelines
    Aditya U Kale
    Alastair Dennison
    Alexander Martindale
    An Wen Chan
    Andrew Beam
    Benjamin Ng
    Cecilia S. Lee
    Christopher Kelly
    Christopher Yau
    David Moher
    Gary Collins
    Lauren Oakden-Rayner
    Lavinia Ferrante di Ruffano
    Melanie Calvert
    Melissa D McCradden
    Pearse Keane
    Robert Golub
    Samantha Cruz Rivera
    Victoria Ngai
    Xiaoxuan Liu
    Nature Communications (2024)
    Preview abstract The Consolidated Standards of Reporting Trials extension for Artificial Intelligence interventions (CONSORT-AI) was published in September 2020. Since its publication, several randomised controlled trials (RCTs) of AI interventions have been published but their completeness and transparency of reporting is unknown. This systematic review assesses the completeness of reporting of AI RCTs following publication of CONSORT-AI and provides a comprehensive summary of RCTs published in recent years. 65 RCTs were identified, mostly conducted in China (37%) and USA (18%). Median concordance with CONSORT-AI reporting was 90% (IQR 77–94%), although only 10 RCTs explicitly reported its use. Several items were consistently under-reported, including algorithm version, accessibility of the AI intervention or code, and references to a study protocol. Only 3 of 52 included journals explicitly endorsed or mandated CONSORT-AI. Despite a generally high concordance amongst recent AI RCTs, some AI-specific considerations remain systematically poorly reported. Further encouragement of CONSORT-AI adoption by journals and funders may enable more complete adoption of the full CONSORT-AI guidelines. View details
    PriorBoost: An Adaptive Algorithm for Learning from Aggregate Responses
    Adel Javanmard
    Proceedings of the 41st International Conference on Machine Learning (2024), pp. 21410-21429
    Preview abstract This work studies algorithms for learning from aggregate responses. We focus on the construction of aggregation sets (called \emph{bags} in the literature) for event-level loss functions. We prove for linear regression and generalized linear models (GLMs) that the optimal bagging problem reduces to one-dimensional size-constrained $k$-means clustering. Further, we theoretically quantify the advantage of using curated bags over random bags. We propose the \texttt{PriorBoost} algorithm, which iteratively forms increasingly homogenous bags with respect to (unseen) individual responses to improve model quality. We also explore label differential privacy for aggregate learning, and provide extensive experiments that demonstrate that \PriorBoost regularly achieves optimal quality, in contrast to non-adaptive algorithms for aggregate learning. View details