Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 10129 publications
    Statistical Analysis of Cardiovascular Diseases Dataset of BRFSS
    Ashank Anshuman
    Aakarshit Uppal
    Indrajit Mukherjee
    Open Access Library Journal, 11 (2024)
    Preview abstract Cardiovascular Diseases (CVDs) remain a leading cause of death in the United States. These diseases, including coronary heart disease, heart attack, and stroke, pose significant health risks. Accurate prediction of CVD probability can aid in prevention and management. To address this challenge, we analyzed data from the Behavioral Risk Factor Surveillance System (BRFSS) spanning 1995-2017. We developed innovative methods to handle missing data and normalize values. Deep learning models were employed to predict risk factors and, subsequently, the likelihood of CVDs. Our models were implemented using TensorFlow and trained on a high-performance computing server. The models accurately predicted risk factors with over 90% accuracy, enabling targeted interventions. We successfully predicted CVD probability with greater than 95% accuracy, providing valuable insights for healthcare providers. An online portal was developed to forecast CVD trends over the next 31 years, facilitating proactive planning and resource allocation. View details
    Taming Self-Training for Open-Vocabulary Object Detection
    Shiyu Zhao
    Samuel Schulter
    Zhixing Zhang
    Vijay Kumar B G
    Yumin Suh
    Manmohan Chandraker
    Dimitris N. Metaxas
    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)
    Preview abstract Recent studies have shown promising performance in open-vocabulary object detection (OVD) by utilizing pseudo labels (PLs) from pretrained vision and language models (VLMs). However, teacher-student self-training, a powerful and widely used paradigm to leverage PLs, is rarely explored for OVD. This work identifies two challenges of using self-training in OVD: noisy PLs from VLMs and frequent distribution changes of PLs. To address these challenges, we propose SAS-Det that tames self-training for OVD from two key perspectives. First, we present a split-and-fusion (SAF) head that splits a standard detection into an open-branch and a closed-branch. This design can reduce noisy supervision from pseudo boxes. Moreover, the two branches learn complementary knowledge from different training data, significantly enhancing performance when fused together. Second, in our view, unlike in closed-set tasks, the PL distributions in OVD are solely determined by the teacher model. We introduce a periodic update strategy to decrease the number of updates to the teacher, thereby decreasing the frequency of changes in PL distributions, which stabilizes the training process. Extensive experiments demonstrate SAS-Det is both efficient and effective. SAS-Det outperforms recent models of the same scale by a clear margin and achieves 37.4 AP50 and 29.1 APr on novel categories of the COCO and LVIS benchmarks, respectively. View details
    Hovering Over the Key to Text Input in XR
    Diar Abdlkarim
    Arpit Bhatia
    Stuart Macgregor
    Jason Fotso-Puepi
    Hasti Seifi
    Massimiliano Di Luca
    Karan Ahuja
    Preview abstract Virtual, Mixed, and Augmented Reality (XR) technologies hold immense potential for transforming productivity beyond PC. Therefore there is a critical need for improved text input solutions for XR. However, achieving efficient text input in these environments remains a significant challenge. This paper examines the current landscape of XR text input techniques, focusing on the importance of keyboards (both physical and virtual) as essential tools. We discuss the unique challenges and opportunities presented by XR, synthesizing key trends from existing solutions. View details
    Preview abstract In the present computerized period, information driven navigation is essential for the progress of cooperative work areas. This paper gives an extensive examination of how information designing, distributed storage, and business insight synergistically engage groups. We look at the basic standards of information designing, zeroing in on the plan, development, and the management of adaptable information pipelines. The job of distributed storage is investigated, featuring its ability to give adaptable, secure, and open information arrangements. Besides, we dive into business knowledge instruments and their capacity to change crude information into significant experiences. Through contextual analyses and exact information, we delineate the groundbreaking effect of these advances in group efficiency, coordinated effort, and dynamic cycles. This examination highlights the significance of incorporating hearty information designing works on, utilizing distributed storage arrangements, and utilizing complex business knowledge apparatuses to establish information engaged cooperative conditions. View details
    Bridging the Gap: Unpacking the Hidden Challenges in Knowledge Distillation for Online Ranking Systems
    Shuo Yang
    Aniruddh Nath
    Yang Liu
    Li Wei
    Shawn Andrews
    Maciej Kula
    Jarrod Kahn
    Zhe Zhao
    Lichan Hong
    Preview abstract Knowledge Distillation (KD) is a powerful approach for compressing large models into smaller, more efficient models, particularly beneficial for latency-sensitive applications like recommender systems. However, current KD research predominantly focuses on Computer Vision (CV) and NLP tasks, overlooking unique data characteristics and challenges inherent to recommender systems. This paper addresses these overlooked challenges, specifically: (1) mitigating data distribution shifts between teacher and student models, (2) efficiently identifying optimal teacher configurations within time and budgetary constraints, and (3) enabling computationally efficient and rapid sharing of teacher labels to support multiple students. We present a robust KD system developed and rigorously evaluated on multiple large-scale personalized video recommendation systems within Google. Our live experiment results demonstrate significant improvements in student model performance while ensuring the consistent and reliable generation of high-quality teacher labels from continuous data streams. View details
    Leveraging Virtual Reality to Enhance Diversity and Inclusion training at Google
    Karla Brown
    Patrick Gage Kelley
    Leonie Sanderson
    2024 CHI Conference on Human Factors in Computing Systems, ACM
    Preview abstract Virtual reality (VR) has emerged as a promising educational training method, offering a more engaging and immersive experience than traditional approaches. In this case study, we explore its effectiveness for diversity, equity, and inclusion (DEI) training, with a focus on how VR can help participants better understand and appreciate different perspectives. We describe the design and development of a VR training application that aims to raise awareness about unconscious biases and promote more inclusive behaviors in the workplace. We report initial findings based on the feedback of Google employees who took our training and found that VR appears to be an effective way to enhance DEI training. In particular, participants reported that VR training helped them better recognize biases and how to effectively respond to them. However, our findings also highlight some challenges with VR-based DEI training, which we discuss in terms of future research directions. View details
    Preview abstract Machine learning has a pseudoscience problem. An abundance of ethical issues arising from the use of machine learning (ML)-based technologies—by now, well documented—is inextricably entwined with the systematic epistemic misuse of these tools. We take a recent resurgence of deep learning-assisted physiognomic research as a case study in the relationship between ML-based pseudoscience and attendant social harms—the standard purview of “AI ethics.” In practice, the epistemic and ethical dimensions of ML misuse often arise from shared underlying reasons and are resolvable by the same pathways. Recent use of ML toward the ends of predicting protected attributes from photographs highlights the need for philosophical, historical, and domain-specific perspectives of particular sciences in the prevention and remediation of misused ML. View details
    Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
    Nitesh Bharadwaj Gundavarapu
    Luca Versari
    Kihyuk Sohn
    Agrim Gupta
    Xiuye Gu
    Alex Hauptmann
    Boqing Gong
    Lu Jiang
    ICLR (2024)
    Preview abstract While Large Language Models (LLMs) are the dominant models for generative tasks in language, they do not perform as well as diffusion models on image and video generation. To effectively use LLMs for visual generation, one crucial component is the visual tokenizer that maps pixel-space inputs to discrete tokens appropriate for LLM learning. In this paper, we introduce MAGVIT-v2, a video tokenizer designed to generate concise and expressive tokens for both videos and images using a common token vocabulary. Equipped with this new tokenizer, we show that LLMs outperform diffusion models on standard image and video generation benchmarks including ImageNet and Kinetics. In addition, we demonstrate that our tokenizer surpasses the previously top-performing video tokenizer on two more tasks: (1) video compression comparable to the next-generation video codec (VCC) according to human evaluations, and (2) learning effective representations for action recognition tasks. View details
    Hardware-Assisted Fault Isolation: Going Beyond the Limits of Software-Based Sandboxing
    Anjo Vahldiek-Oberwagner
    Tal Garfinkel
    Deian Stefan
    Michael LeMay
    Evan Johnson
    Mohammadkazem Taram
    Chris Fallin
    Ravi Sahita
    Joey Rudek
    Shravan Narayan
    Dean Tullsen
    IEEE Micro (2024)
    Preview abstract Hardware-assisted Fault Isolation (HFI) is a minimal extension to current processors that supports secure, flexible, and efficient in-process isolation. HFI addresses the limitations of software-based isolation (SFI) systems including: runtime overheads, limited scalability, vulnerability to Spectre attacks, and limited compatibility with existing code. HFI can be seamlessly integrated into exisiting SFI systems (e.g. WebAssembly), or directly sandbox unmodified native binaries. To ease adoption, HFI proposes incremental changes to existing high-performance processors. View details
    Preview abstract The web utilizes permission prompts to moderate access to certain capabilities. We present the first investigation of user behavior and sentiment of this security and privacy measure on the web, using 28 days of telemetry data from more than 100M Chrome installations on desktop platforms and experience sampling responses from 25,706 Chrome users. Based on this data, we find that ignoring and dismissing permission prompts are most common for geolocation and notifications. Permission prompts are perceived as more annoying and interrupting when they are not allowed, and most respondents cite a rational reason for the decision they took. Our data also supports that the perceived availability of contextual information from the requesting website is associated with allowing access to a requested capability. More usable permission controls could facilitate adoption of best practices that address several of the identified challenges; and ultimately could lead to better user experiences and a safer web. View details
    Preview abstract A vast amount of human discussion, storytelling, content creation, and reporting now occurs on social media platforms. As such, social media posts are often quoted on web pages as context. In this paper, we argue that these quotations and their surrounding page context provide a rich, platform-independent source of data for studying the intersection of natural language and social media. We introduce a taxonomy of quotation roles that categorizes how social media posts are used within content. We release a dataset of 38M social quotes derived from the Common Crawl, and role labels for a subset assessed by human raters. We show that the interplay of accounts, roles, and topics across the web graph reveal valuable social diffusion patterns, and that roles can be predicted with fine-tuned large language models from web context. View details
    Preview abstract Trust is central to how developers engage with AI. In this article, we discuss what we learned from developers about their level of trust in AI enhanced developer tooling, and how we translated those findings into product design recommendations to support customization, and the challenges we encountered along the way. View details
    Preview abstract In this article, we study the evolution of Android permissions. We describe the rationale behind key changes in Android’s permission model and disclose two permission-related security vulnerabilities we discovered. Lastly, we provide developers actionable insights to proactively address permission-related security and privacy risks during development. View details
    Prompt-Based Label-Aware Framework for Few-Shot Multi-Label Text Classification
    Thanakorn Thaminkaew
    Peerapon Vateekul
    IEEE Access, 12 (2024), pp. 28310-28322
    Preview abstract Prompt-based learning has demonstrated remarkable success in few-shot text classification, outperforming the traditional fine-tuning approach. This method transforms a text input into a masked language modeling prompt using a template, queries a fine-tuned language model to fill in the mask, and then uses a verbalizer to map the model’s output to a predicted class. Previous prompt-based text classification approaches were primarily designed for multi-class classification, taking advantage of the fact that the classes are mutually exclusive and one example belongs to only one class. However, these assumptions do not hold in the context of multi-label text classification, where labels often exhibit correlations with each other. Therefore, we propose a Prompt-based Label-Aware framework for Multi-Label text classification (PLAML) that addresses the challenges. Specifically, PLAML enhances prompt-based learning with three proposed techniques to improve the overall performance for multi-label classification. The techniques include (i) a token weighting algorithm that considers the correlations between labels, (ii) a template for augmenting training samples, making the training process label-aware, and (iii) a dynamic threshold mechanism, refining the prediction condition of each label. Extensive experiments on few-shot text classification across multiple datasets with various languages show that our PLAML outperforms other baseline methods. We also analyzed the effect of each proposed technique to better understand how it is suitable for the multi-label setting. View details
    DynaMITE-RL: A Dynamic Model for Improved Temporal Meta Reinforcement Learning
    Anthony Liang
    Erdem Biyik
    Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS-24), Vancouver (2024)
    Preview abstract We introduce a meta-reinforcement learning (meta-RL) approach, called DynaMITE-RL, to perform approximate inference in environments where the latent information evolves slowly between subtrajectories called sessions. We identify three key modifications to contemporary meta-RL methods: consistency of latent information during sessions, session masking, and prior latent conditioning. We demonstrate the necessity of these modifications on various downstream applications from discrete Gridworld environments to continuous control and simulated robot assistive tasks and find that our approach significantly outperforms contemporary baselines. View details