Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10129 publications
Preview abstract
In this talk, we will introduce the development and evolution of speaker diarization technologies at Google in the past decade, and how they landed as impactful products such as Cloud Speech-to-Text and the Pixel Recorder app. The talk will cover four critical milestones of the speaker diarization technologies at Google: (1) leveraging deep speaker embeddings; (2) leveraging supervised clustering; (3) leveraging sequence transducers; and (4) leveraging large language models. The talk will also discuss how speaker diarization will evolve in the new era of multimodal large language models.
View details
"We Need Structured Output": Towards User-centered Constraints on Large Language Model Output
Michael Xieyang Liu
Frederick Liu
Alex Fiannaca
Terry Koo
In Extended Abstract in ACM CHI Conference on Human Factors in Computing Systems (CHI EA '24), ACM (2024), pp. 9 (to appear)
Preview abstract
Large language models can produce creative and diverse responses. However, to integrate them into current developer workflows, it is essential to constrain their outputs to follow specific formats or standards. In this work, we surveyed 51 experienced industry professionals to understand the range of scenarios and motivations driving the need for output constraints from a user-centered perspective. We identified 134 concrete use cases for constraints at two levels: low-level, which ensures the output adhere to a structured format and an appropriate length, and high-level, which requires the output to follow semantic and stylistic guidelines without hallucination. Critically, applying output constraints could not only streamline the currently repetitive process of developing, testing, and integrating LLM prompts for developers, but also enhance the user experience of LLM-powered features and applications. We conclude with a discussion on user preferences and needs towards articulating intended constraints for LLMs, alongside an initial design for a constraint prototyping tool.
View details
Relational Affect in Dyadic Interactions
CHI Conference on Human Factors in Computing Systems (2024)
Preview abstract
Relational affect is the affective response (encompassing emotion, expression, feeling) that emerges from an interaction between two people. The case study presented here introduces the concept of relational affect through a human perceptual rating task. Forty-five raters watched short video clips of two people interacting and described their perceived emotion of the individuals and that of the overall interaction. Our qualitative analysis of the rater responses showed that raters used a variety of schemes to reason about emotion, including expressions, context, and perceived appraisal of the event. These reasoning schemes were notably different for perceived individual emotion and relational affect. Our findings show that the vocabulary use for relational affect is distinct from that of individual emotion and relational affect as a phenomenon deepens our understanding of social interactions and moves the field a step closer to realizing the goal of fluid interactions between people and technology.
View details
Android Permissions: Evolution, Attacks, and Best Practices
IEEE Security & Privacy (2024)
Preview abstract
In this article, we study the evolution of Android permissions. We describe the rationale behind key changes in Android’s permission model and disclose two permission-related security vulnerabilities we discovered. Finally, we provide developers actionable insights to proactively address permission-related security and privacy risks during development.
View details
USER-LLM: Efficient LLM Contextualization with User Embedding
Jiaxing Wu
Neo Wu
Devora Berlowitz
Sushant Prakash
Bradley Green
Shawn O'Banion
Jun Xie
ArXiv (2024) (to appear)
Preview abstract
Large language models (LLMs) have revolutionized natural language processing. However, effectively incorporating complex and potentially noisy user interaction data remains a challenge. To address this, we propose User-LLM, a novel framework that leverages user embeddings to contextualize LLMs. These embeddings, distilled from diverse user interactions using self-supervised pretraining, capture latent user preferences and their evolution over time. We integrate these user embeddings with LLMs through cross-attention and soft-prompting, enabling LLMs to dynamically adapt to user context. Our comprehensive experiments on MovieLens, Amazon Review, and Google Local Review datasets demonstrate significant performance gains across various tasks. Notably, our approach outperforms text-prompt-based contextualization on long sequence tasks and tasks that require deep user understanding while being computationally efficient. We further incorporate Perceiver layers to streamline the integration between user encoders and LLMs, reducing computational demands.
View details
UINav: A Practical Approach to Train On-Device Automation Agents
Wei Li
Fu-Lin Hsu
Will Bishop
Folawiyo Campbell-Ajala
2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024) - Industry Track
Preview abstract
Automation systems that can autonomously drive application user interfaces to complete user tasks are of great benefit, especially when users are situationally or permanently impaired. Prior automation systems do not produce generalizable models while AI-based automation agents work reliably only in simple, hand-crafted applications or incur high computation costs. We propose UINav, a demonstration-based approach to train automation agents that fit mobile devices, yet achieving high success rates with modest numbers of demonstrations. To reduce the demonstration overhead, UINav, uses a referee model that provides users with immediate feedback on tasks where the agent fails, and automatically augments human demonstrations to increase diversity in training data. Our evaluation shows that with only 10 demonstrations UINav, can achieve 70% accuracy, and that with enough demonstrations it can surpass 90% accuracy.
View details
Optimizing quantum gates towards the scale of logical qubits
Alexandre Bourassa
Andrew Dunsworth
Will Livingston
Vlad Sivak
Trond Andersen
Yaxing Zhang
Desmond Chik
Jimmy Chen
Charles Neill
Alejo Grajales Dau
Anthony Megrant
Alexander Korotkov
Vadim Smelyanskiy
Yu Chen
Nature Communications, 15 (2024), pp. 2442
Preview abstract
A foundational assumption of quantum error correction theory is that quantum gates can be scaled to large processors without exceeding the error-threshold for fault tolerance. Two major challenges that could become fundamental roadblocks are manufacturing high-performance quantum hardware and engineering a control system that can reach its performance limits. The control challenge of scaling quantum gates from small to large processors without degrading performance often maps to non-convex, high-constraint, and time-dynamic control optimization over an exponentially expanding configuration space. Here we report on a control optimization strategy that can scalably overcome the complexity of such problems. We demonstrate it by choreographing the frequency trajectories of 68 frequency-tunable superconducting qubits to execute single- and two-qubit gates while mitigating computational errors. When combined with a comprehensive model of physical errors across our processor, the strategy suppresses physical error rates by ~3.7× compared with the case of no optimization. Furthermore, it is projected to achieve a similar performance advantage on a distance-23 surface code logical qubit with 1057 physical qubits. Our control optimization strategy solves a generic scaling challenge in a way that can be adapted to a variety of quantum operations, algorithms, and computing architectures.
View details
Conversational AI in health: Design considerations from a Wizard-of-Oz dermatology case study with users, clinicians and a medical LLM
Brenna Li
Amy Wang
Patricia Strachan
Julie Anne Seguin
Sami Lachgar
Karyn Schroeder
Renee Wong
Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems, Association for Computing Machinery, pp. 10
Preview abstract
Although skin concerns are common, access to specialist care is limited. Artificial intelligence (AI)-assisted tools to support medical decisions may provide patients with feedback on their concerns while also helping ensure the most urgent cases are routed to dermatologists. Although AI-based conversational agents have been explored recently, how they are perceived by patients and clinicians is not well understood. We conducted a Wizard-of-Oz study involving 18 participants with real skin concerns. Participants were randomly assigned to interact with either a clinician agent (portrayed by a dermatologist) or an LLM agent (supervised by a dermatologist) via synchronous multimodal chat. In both conditions, participants found the conversation to be helpful in understanding their medical situation and alleviate their concerns. Through qualitative coding of the conversation transcripts, we provide insight on the importance of empathy and effective information-seeking. We conclude with design considerations for future AI-based conversational agents in healthcare settings.
View details
Productive Coverage: Improving the Actionability of Code Coverage
Gordon
Luka Kalinovcic
Marko Ivanković
Mateusz Lewko
Rene Just
Yana Kulizhskaya
ICSE-SEIP '24: Proceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice (2024) (to appear)
Preview abstract
Code coverage is an intuitive and established test adequacy measure.
However, not all parts of the code base are equally important, and
hence additional testing may be critical for some uncovered code,
whereas it may not be worthwhile for other uncovered code. As a
result, simply visualizing uncovered code is not reliably actionable.
To make code coverage actionable and further improve code
coverage in our codebase, we developed Productive Coverage —
a novel approach to code coverage that guides developers to uncovered code that that should be tested by (unit) tests. Specifically,
Productive Coverage identifies uncovered code that is similar to
existing code, which in turn is tested and/or frequently executed in
production. We implemented and evaluated Productive Coverage
for four programming languages (C++, Java, Go, and Python). The
evaluation shows: (1) The developer sentiment, measured at the
point of use, is strongly positive; (2) Productive Coverage meaningfully increases code coverage above a strong baseline; (3) Productive
Coverage has no negative effect on code authoring efficiency; (4)
Productive Coverage modestly improves code-review effiency; (5)
Productive Coverage directly improves code quality and prevents
bugs from being introduced, in addition to improving test quality
View details
Preview abstract
In the current digital world, large organizations (sometimes referred to as tech giants) provide service to extremely large numbers of users. The service provider is often interested in computing various data analyses over the private data of its users, which in turn have their incentives to cooperate, but do not necessarily trust the service provider.
In this work, we introduce the Gulliver multi-party computation model (GMPC) to realistically capture the above scenario. The GMPC model considers a single highly powerful party, called the server or Gulliver, that is connected to n users over a star topology network (alternatively formulated as a full network, where the server can block any message). The users are significantly less powerful than the server, and, in particular, should have both computation and communication complexities that are polylogarithmic in n. Protocols in the GMPC model should be secure against malicious adversaries that may corrupt a subset of the users and/or the server.
Designing protocols in the GMPC model is a delicate task, since users can only hold information about polylog(n) other users (and, in particular, can only communicate with polylog(n) other users). In addition, the server can block any message between any pair of honest parties. Thus, reaching an agreement becomes a challenging task. Nevertheless, we design generic protocols in the GMPC model, assuming that <1/8 fraction of the users may be corrupted (in addition to the server). Our main contribution is a variant of Feige's committee election protocol [FOCS 1999] that is secure in the GMPC model. Given this tool we show:
* Assuming fully homomorphic encryption (FHE), any computationally efficient function with O(n polylog(n))-size output can be securely computed in the GMPC model.
* Any function that can be computed by a circuit of O(polylog(n)) depth, O(n polylog(n))$ size, and bounded fan-in and fan-out can be securely computed in the GMPC model assuming vector commitment schemes (without assuming FHE).
* In particular, sorting can be securely computed in the GMPC model assuming vector commitment schemes. This has important applications for the shuffle model of differential privacy, and resolves an open question of Bell et al. [CCS 2020].
View details
Streamlining Workload Management in AI-Driven Cloud Architectures: A Comparative Algorithmic Approach
Pravallika Mannem
Kiran Kumar Patibandla
International Research Journal of Engineering and Technology, 11 (2024), pp. 113-121
Preview abstract
The use of artificial intelligence (AI) in cloud architectures has significantly increased processing efficiency and scale. However, with the development of complex algorithms and big data as well as surprisingly entered into our machine learning world; workload management becomes a significant issue in AI cloud computing. Existing workload management solutions are rule-based heuristics that may result in underutilization of resources and poor performance. For that, we present an algorithmic comparative approach to easing the burden of workload management for AI-driven cloud architectures. This is in contrast to executing a batch of tasks with different algorithms and comparing performance, cost, etc. We use ML methods to determine the best algorithm for our workload, and then deploy this in a self-contained binary that can switch between algorithms at runtime on an available resource. We validated our scheme with simulations, which demonstrates the capability of superior resource use and diminished completion time in comparison to rule-based schemes. When needed, flexibility and scalability allow you easier control over workloads that are subject to change or allocation. By simplifying AI-driven cloud workload management, the elasticity of their overall approach greatly enhances efficiency and scalability for those organizations looking to run even larger and take advantage of more complex workloads faster Tweet this Share on Facebook.
View details
Preview abstract
A product manager’s specific role varies from one company to the next. Still, all product managers balance many aspects of their job, including customers’ needs, a vision for new products, and the project team. So what tools and strategies are needed to create a successful career as a product manager? What are the “5 Things You Need To Create A Successful Career As A Product Manager”? Authority Magazine speaks with Aqsa Fulara, a product manager at Google to answer these questions with stories and insights from her experiences.
View details
Preview abstract
Algorithms for the computation of alternative routes in road networks power many geographic navigation systems. A good set of alternative routes offers meaningful options to the user of the system and can support applications such as routing that is robust to failures (e.g., road closures, extreme traffic congestion, etc.) and routing with diverse preferences and objective functions. Algorithmic techniques for alternative route computation include the penalty method, via-node type algorithms (which deploy bidirectional search and finding plateaus), and, more recently, electrical-circuit based algorithms. In this work we focus on the practically important family of via-node type algorithms and we aim to produce high quality alternative routes for road netowrks. We study alternative route computation in the presence of a fast routing infrastructure that relies on hierarchical routing (namely, CRP). We propose new approaches that rely on deep learning methods. Our training methodology utilizes the hierarchical partition of the graph and builds models to predict which boundary road segments in the partition should be crossed by the alternative routes. We describe our methods in detail and evaluate them against the previously studied architectures, as well as against a stronger baseline that we define in this work, showing improvements in quality in the road networks of Seattle, Paris, and Bangalore.
View details
Concordance of randomised controlled trials for artificial intelligence interventions with the CONSORT-AI reporting guidelines
Aditya U Kale
Alastair Dennison
Alexander Martindale
An Wen Chan
Andrew Beam
Benjamin Ng
Cecilia S. Lee
Christopher Kelly
Christopher Yau
David Moher
Gary Collins
Lauren Oakden-Rayner
Lavinia Ferrante di Ruffano
Melanie Calvert
Melissa D McCradden
Pearse Keane
Robert Golub
Samantha Cruz Rivera
Victoria Ngai
Xiaoxuan Liu
Nature Communications (2024)
Preview abstract
The Consolidated Standards of Reporting Trials extension for Artificial Intelligence interventions (CONSORT-AI) was published in September 2020. Since its publication, several randomised controlled trials (RCTs) of AI interventions have been published but their completeness and transparency of reporting is unknown. This systematic review assesses the completeness of reporting of AI RCTs following publication of CONSORT-AI and provides a comprehensive summary of RCTs published in recent years. 65 RCTs were identified, mostly conducted in China (37%) and USA (18%). Median concordance with CONSORT-AI reporting was 90% (IQR 77–94%), although only 10 RCTs explicitly reported its use. Several items were consistently under-reported, including algorithm version, accessibility of the AI intervention or code, and references to a study protocol. Only 3 of 52 included journals explicitly endorsed or mandated CONSORT-AI. Despite a generally high concordance amongst recent AI RCTs, some AI-specific considerations remain systematically poorly reported. Further encouragement of CONSORT-AI adoption by journals and funders may enable more complete adoption of the full CONSORT-AI guidelines.
View details
PriorBoost: An Adaptive Algorithm for Learning from Aggregate Responses
Adel Javanmard
Proceedings of the 41st International Conference on Machine Learning (2024), pp. 21410-21429
Preview abstract
This work studies algorithms for learning from aggregate responses. We focus on the construction of aggregation sets (called \emph{bags} in the literature) for event-level loss functions. We prove for linear regression and generalized linear models (GLMs) that the optimal bagging problem reduces to one-dimensional size-constrained $k$-means clustering. Further, we theoretically quantify the advantage of using curated bags over random bags. We propose the \texttt{PriorBoost} algorithm, which iteratively forms increasingly homogenous bags with respect to (unseen) individual responses to improve model quality. We also explore label differential privacy for aggregate learning, and provide extensive experiments that demonstrate that \PriorBoost regularly achieves optimal quality, in contrast to non-adaptive algorithms for aggregate learning.
View details