Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10132 publications
Preview abstract
Cloud computing architectures are more scalable and economical which is the main reason that has contributed to its popularity. However, they bring their own set of challenges when it comes to workload scheduling and resource utilization because virtual machines (VM) and applications have to share different types of resources like servers, storage, etc. Historically, other strategies for workload balancing and resource management include manual configuration or simplistic heuristics that do not provide effective optimizations of resource usage and performance. In this technical brief, we propose an approach built on the use of unsupervised learning techniques to detect usage patterns perceptively and improve resource utilization, which corresponds to both optimal performance and automatically balanced workload among VMs. We are making use of clustering algorithms to cluster similar workloads and then resource allocation for each group based on demand. The point of this step is to use the resources more effectively so we do not run into resource exhaustion. We also integrate anomaly detection methods within our system for identifying and handling abnormal behavior by both monitoring and placing resources. We experiment with region traces from production workloads to demonstrate the benefits of our approach, showing marked improvements in workload balancing and resource utilization over current practices.
View details
Hovering Over the Key to Text Input in XR
Diar Abdlkarim
Arpit Bhatia
Stuart Macgregor
Jason Fotso-Puepi
Hasti Seifi
Massimiliano Di Luca
Karan Ahuja
Preview abstract
Virtual, Mixed, and Augmented Reality (XR) technologies hold immense potential for transforming productivity beyond PC. Therefore there is a critical need for improved text input solutions for XR. However, achieving efficient text input in these environments remains a significant challenge. This paper examines the current landscape of XR text input techniques, focusing on the importance of keyboards (both physical and virtual) as essential tools. We discuss the unique challenges and opportunities presented by XR, synthesizing key trends from existing solutions.
View details
Optimal Mechanisms for a Value Maximizer: The Futility of Screening Targets
Proceedings of the 25th ACM Conference on Economics and Computation (EC) (2024)
Preview abstract
Motivated by the increased adoption of autobidding algorithms in internet advertising markets, we study the design of optimal mechanisms for selling items to a value-maximizing buyer with a return-on-spend constraint. The buyer's values and target ratio in the return-on-spend constraint are private. We restrict attention to deterministic sequential screening mechanisms that can be implemented as a menu of prices paid for purchasing an item or not. The main result of this paper is to provide a characterization of an optimal mechanism. Surprisingly, we show that the optimal mechanism does not require target screening, i.e., offering a single pair of prices is optimal for the seller. The optimal mechanism is a subsidized posted price that provides a subsidy to the buyer to encourage participation and then charges a fixed unit price for each item sold. The seller's problem is a challenging non-linear mechanism design problem, and a key technical contribution of our work is to provide a novel approach to analyze non-linear pricing contracts.
View details
Federated Variational Inference: Towards Improved Personalization and Generalization
Elahe Vedadi
Josh Dillon
Philip Mansfield
Karan Singhal
Arash Afkanpour
Warren Morningstar
AAAI Federated Learning on the Edge Symposium (2024)
Preview abstract
Conventional federated learning algorithms train a single global model by leveraging all participating clients' data. However, due to heterogeneity in client generative distributions and predictive models, these approaches may not appropriately approximate the predictive process, converge to an optimal state, or generalize to new clients. We study personalization and generalization in stateless cross-device federated learning setups assuming heterogeneity in client data distributions and predictive models. We first propose a hierarchical generative model and formalize it using Bayesian Inference. We then approximate this process using Variational Inference to train our model efficiently. We call this algorithm Federated Variational Inference (FedVI). We use PAC-Bayes analysis to provide generalization bounds for FedVI. We evaluate our model on FEMNIST and CIFAR-100 image classification and show that FedVI beats the state-of-the-art on both tasks.
View details
Preview abstract
The evolution of AI is a pivotal moment in history, but it’s not the first time we have experienced technological advances that have changed how humans work. By looking at the advances in automobiles, we are reminded of the importance of focusing on our developers' needs and goals.
View details
ViSAGe: A Global-Scale Analysis of Visual Stereotypes in Text-to-Image Generation
Akshita Jha
Sarah Laszlo
Rida Qadri
Chandan Reddy
ACL (2024)
Preview abstract
Recent studies have highlighted the issue of varying degrees of stereotypical depictions for different identity group. However, these existing approaches have several key limitations, including a noticeable lack of coverage of identity groups in their evaluation, and the range of their associated stereotypes. Additionally, these studies often lack a critical distinction between inherently visual stereotypes, such as `brown' or `sombrero', and culturally influenced stereotypes like `kind' or `intelligent'. In this work, we address these limitations by grounding our evaluation of regional, geo-cultural stereotypes in the generated images from Text-to-Image models by leveraging existing textual resources. We employ existing stereotype benchmarks to evaluate stereotypes and focus exclusively on the identification of visual stereotypes within the generated images spanning 135 identity groups. We also compute the offensiveness across identity groups, and check the feasibility of identifying stereotypes automatically. Further, through a detailed case study and quantitative analysis, we reveal how the default representations of all identity groups have a more stereotypical appearance, and for historically marginalized groups, how the images across different attributes are visually more similar than other groups, even when explicitly prompted otherwise.
View details
Photorealistic Video Generation with Diffusion Models
Agrim Gupta
Kihyuk Sohn
Xiuye Gu
Fei-Fei Li
Lu Jiang
ECCV (2024)
Preview abstract
We present W.A.L.T, a transformer-based approach for photorealistic video generation via diffusion modeling. Our approach has two key design decisions. First, we use a causal encoder to jointly compress images and videos within a unified latent space, enabling training and generation across modalities. Second, for memory and training efficiency, we use a window attention architecture tailored for joint spatial and spatiotemporal generative modeling. Taken together these design decisions enable us to achieve state-of-the-art performance on established video (UCF-101 and Kinetics-600) and image (ImageNet) generation benchmarks without using classifier free guidance. Finally, we also train a cascade of three models for the task of text-to-video generation consisting of a base latent video diffusion model, and two video super-resolution diffusion models to generate videos of 512*896 resolution at 8 frames per second.
View details
Preview abstract
This paper discusses a method to inject text when training an ASR system without the need for up sampling the text sequence to match the length of the speech sequence.
View details
TextMesh: Generation of Realistic 3D Meshes From Text Prompts
Christina Tsalicoglou
Fabian Manhardt
Michael Niemeyer
3DV 2024 (2024)
Preview abstract
The ability to generate highly realistic 2D images from mere text prompts has recently made huge progress in terms of speed and quality, thanks to the advent of image diffusion models. Naturally, the question arises if this can be also achieved in the generation of 3D content from such text prompts. To this end, a new line of methods recently emerged trying to harness diffusion models, trained on 2D images, for supervision of 3D model generation using view dependent prompts. While achieving impressive results, these methods, however, have two major drawbacks. First, rather than commonly used 3D meshes, they instead generate neural radiance fields (NeRFs), making them impractical for most real applications. Second, these approaches tend to produce over-saturated models, giving the output a cartoonish looking effect. Therefore, in this work we propose a novel method for generation of highly realistic-looking 3D meshes. To this end, we extend NeRF to employ an SDF backbone, leading to improved 3D mesh extraction. In addition, we propose a novel way to finetune the mesh texture, removing the effect of high saturation and improving the details of the output 3D mesh.
View details
Towards Conversational Diagnostic AI
Anil Palepu
Khaled Saab
Jan Freyberg
Ryutaro Tanno
Amy Wang
Brenna Li
Nenad Tomašev
Karan Singhal
Le Hou
Albert Webson
Kavita Kulkarni
Sara Mahdavi
Juro Gottweis
Joelle Barral
Kat Chou
Arxiv (2024) (to appear)
Preview abstract
At the heart of medicine lies the physician-patient dialogue, where skillful history-taking paves the way for accurate diagnosis, effective management, and enduring trust. Artificial Intelligence (AI) systems capable of diagnostic dialogue could increase accessibility, consistency, and quality of care. However, approximating clinicians' expertise is an outstanding grand challenge. Here, we introduce AMIE (Articulate Medical Intelligence Explorer), a Large Language Model (LLM) based AI system optimized for diagnostic dialogue.
AMIE uses a novel self-play based simulated environment with automated feedback mechanisms for scaling learning across diverse disease conditions, specialties, and contexts. We designed a framework for evaluating clinically-meaningful axes of performance including history-taking, diagnostic accuracy, management reasoning, communication skills, and empathy. We compared AMIE's performance to that of primary care physicians (PCPs) in a randomized, double-blind crossover study of text-based consultations with validated patient actors in the style of an Objective Structured Clinical Examination (OSCE). The study included 149 case scenarios from clinical providers in Canada, the UK, and India, 20 PCPs for comparison with AMIE, and evaluations by specialist physicians and patient actors. AMIE demonstrated greater diagnostic accuracy and superior performance on 28 of 32 axes according to specialist physicians and 24 of 26 axes according to patient actors. Our research has several limitations and should be interpreted with appropriate caution. Clinicians were limited to unfamiliar synchronous text-chat which permits large-scale LLM-patient interactions but is not representative of usual clinical practice. While further research is required before AMIE could be translated to real-world settings, the results represent a milestone towards conversational diagnostic AI.
View details
Preview abstract
We propose OmniNOCS, a large-scale monocular dataset with 3D Normalized Object Coordinate Space (NOCS) maps, object masks, and 3D bounding box annotations for indoor and outdoor scenes. OmniNOCS has 20 times more object classes and 200 times more instances than existing NOCS datasets (NOCS-Real275, Wild6D). We use OmniNOCS to train a novel, transformer-based monocular NOCS prediction model (NOCSformer) that can predict accurate NOCS, instance masks and poses from 2D object detections across diverse classes. It is the first NOCS model that can generalize to a broad range of classes when prompted with 2D boxes. We evaluate our model on the task of 3D oriented bounding box prediction, where it achieves comparable results to state-of-the-art 3D detection methods such as Cube R-CNN. Unlike other 3D detection methods, our model also provides detailed and accurate 3D object shape and segmentation. We propose a novel benchmark for the task of NOCS prediction based on OmniNOCS, which we hope will serve as a useful baseline for future work in this area. Our dataset and code is available at the project website: https://omninocs.github.io
View details
Analysis of objective and subjective sleep metrics and smartphone usage patterns
Conor Heneghan
Daniel McDuff
Ari Winbush
Nicholas Allen
John Hernandez
Allen Jiang
Andrew Barakat
Logan Schneider
Benjamin Nelson
Ben Yetton
Preview abstract
Analysis of objective and subjective sleep metrics and smartphone usage patterns
Conor Heneghan, , Daniel McDuff, Ari Winbush, Nicholas Allen, John Hernandez, Allen Jiang,, Andrew Barakat, Logan Schneider, Benjamin Nelson, Ben Yetton
Consumer Health Research Team, Google Inc.
Department of Psychology, University of Oregon
Verily Life Sciences
Department of Psychiatry, Harvard Medical School and Beth Israel Deaconess Medical Center
Introduction: The Digital Wellbeing Study is an IRB approved joint study between the University of Oregon and Google to investigate how smartphone usage interacts with objective and
subjective parameters of well-being such as sleep, exercise and stress. The study recruited a demographically diverse population who each wore a smartwatch and installed a smartphone app linked to the study. Participants completed demographic and health questionnaires including the PROMIS Sleep Disturbance (SD) Short Form. Aims of the study included (a) whether objective sleep duration was correlated with smartphone use, and (b) whether smartphone usage could predict the subjective self reported sleep instrument.
Methods: There was sufficient data from 7,499 users to conduct a population modeling analysis. An Ordinary Least Squares linear model was used as a predictor of each subject’s average total sleep time (TST) and their SD t-score. The inputs to the model included demographics, and population z-scored activity measures (steps, sedentary time, time driving, time at work, home and other locations, phone screen time, frequency of phone unlocks)
over seven days prior to the survey.
Results: The activity measures and baseline demographics could only explain a small amount of the overall variance in TST and SD (R^2=0.04 for TST and R^2=0.05 for SD). Phone screen
time was a statistically significant predictor of both TST (-8.19 mins, p< 0.001) and self-reported sleep disruption (0.611 t-score units, p< 0.001). The number of phone unlocks was a predictor of variability in TST (-3.33 mins, p< 0.001) suggesting that longer session times are correlated with greater TST variability. The effects are minimal (e.g., a subject who has one standard
deviation greater phone screen time than average would be predicted to only see a 2% reduction in TST, and a 0.6% increase in perceived sleep disturbance). Time driving and step count were
also minor predictors of SD and TST.
Conclusion: At a population level, average activity measures from wearables and smartphones such as steps, smartphone usage time, sedentary activity etc. are limited predictors of
objective sleep metrics such as Total Sleep Time, and subjective sleep metrics such as the PROMIS Sleep Disturbance t-score.
Support (if any): This research was funded by Google Inc.
View details
A Toolbox for Surfacing Health Equity Harms and Biases in Large Language Models
Heather Cole-Lewis
Nenad Tomašev
Liam McCoy
Leo Anthony Celi
Alanna Walton
Akeiylah DeWitt
Philip Mansfield
Sushant Prakash
Joelle Barral
Ivor Horn
Karan Singhal
Nature Medicine (2024)
Preview abstract
Large language models (LLMs) hold promise to serve complex health information needs but also have the potential to introduce harm and exacerbate health disparities. Reliably evaluating equity-related model failures is a critical step toward developing systems that promote health equity. We present resources and methodologies for surfacing biases with potential to precipitate equity-related harms in long-form, LLM-generated answers to medical questions and conduct a large-scale empirical case study with the Med-PaLM 2 LLM. Our contributions include a multifactorial framework for human assessment of LLM-generated answers for biases and EquityMedQA, a collection of seven datasets enriched for adversarial queries. Both our human assessment framework and our dataset design process are grounded in an iterative participatory approach and review of Med-PaLM 2 answers. Through our empirical study, we find that our approach surfaces biases that may be missed by narrower evaluation approaches. Our experience underscores the importance of using diverse assessment methodologies and involving raters of varying backgrounds and expertise. While our approach is not sufficient to holistically assess whether the deployment of an artificial intelligence (AI) system promotes equitable health outcomes, we hope that it can be leveraged and built upon toward a shared goal of LLMs that promote accessible and equitable healthcare.
View details
Preview abstract
In recommendation systems, there has been a growth in the number of recommendable items (# of movies, music, products). When the set of recommendable items is large, training and evaluation of item recommendation models becomes computationally expensive. To lower this cost, it has become common to sample negative items. However, the recommendation quality can suffer from biases introduced by traditional negative sampling mechanisms. In this work, we demonstrate the benefits from correcting the bias introduced by sampling of negatives. We first provide sampled batch version of the well-studied WARP and LambdaRank methods. Then, we present how these methods can benefit from improved ranking estimates. Finally, we evaluate the recommendation quality as a result of correcting rank estimates and demonstrate that WARP and LambdaRank can be learned efficiently with negative sampling and our proposed correction technique.
View details
Preview abstract
In-DRAM Stochastic and Approximate Counting (DSAC) is a recently published algorithm that aims to mitigate Rowhammer at low cost. Existing in-DRAM counter-based schemes keep track of row activations and issue Targeted Row Refresh (TRR) upon detecting a concerning pattern. However, due to insufficiency of the tracking ability they are vulnerable to attacks utilizing decoy rows. DSAC claims to improve upon existing TRR mitigation by filtering out decoy-row accesses, so they cannot saturate the limited number of counters available for detecting Rowhammer, promising a reliable mitigation without the area cost of deterministic and provable schemes such as per-row activation counting (PRAC).
In this paper, we analyze DSAC and discover some gaps that make it vulnerable to Rowhammer and Rowpress attacks. The main focus of this work is a novel attack named SoothSayer that targets the counter replacement policy in DSAC by cloning the random number generator. We describe and simulate this attack, and establish its efficacy. Finally, we discuss other weaknesses in DSAC.
View details