Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 10132 publications
    Preview abstract At Google, we’ve been running a quarterly large-scale survey with developers since 2018. In this article, we will discuss how we run EngSat, some of our key learnings over the past 6 years, and how we’ve evolved our approach to meet new needs and challenges. View details
    Analysis of objective and subjective sleep metrics and smartphone usage patterns
    Conor Heneghan
    Daniel McDuff
    Ari Winbush
    Nicholas Allen
    John Hernandez
    Allen Jiang
    Andrew Barakat
    Logan Schneider
    Benjamin Nelson
    Ben Yetton
    Preview abstract Analysis of objective and subjective sleep metrics and smartphone usage patterns Conor Heneghan, , Daniel McDuff, Ari Winbush, Nicholas Allen, John Hernandez, Allen Jiang,, Andrew Barakat, Logan Schneider, Benjamin Nelson, Ben Yetton Consumer Health Research Team, Google Inc. Department of Psychology, University of Oregon Verily Life Sciences Department of Psychiatry, Harvard Medical School and Beth Israel Deaconess Medical Center Introduction: The Digital Wellbeing Study is an IRB approved joint study between the University of Oregon and Google to investigate how smartphone usage interacts with objective and subjective parameters of well-being such as sleep, exercise and stress. The study recruited a demographically diverse population who each wore a smartwatch and installed a smartphone app linked to the study. Participants completed demographic and health questionnaires including the PROMIS Sleep Disturbance (SD) Short Form. Aims of the study included (a) whether objective sleep duration was correlated with smartphone use, and (b) whether smartphone usage could predict the subjective self reported sleep instrument. Methods: There was sufficient data from 7,499 users to conduct a population modeling analysis. An Ordinary Least Squares linear model was used as a predictor of each subject’s average total sleep time (TST) and their SD t-score. The inputs to the model included demographics, and population z-scored activity measures (steps, sedentary time, time driving, time at work, home and other locations, phone screen time, frequency of phone unlocks) over seven days prior to the survey. Results: The activity measures and baseline demographics could only explain a small amount of the overall variance in TST and SD (R^2=0.04 for TST and R^2=0.05 for SD). Phone screen time was a statistically significant predictor of both TST (-8.19 mins, p< 0.001) and self-reported sleep disruption (0.611 t-score units, p< 0.001). The number of phone unlocks was a predictor of variability in TST (-3.33 mins, p< 0.001) suggesting that longer session times are correlated with greater TST variability. The effects are minimal (e.g., a subject who has one standard deviation greater phone screen time than average would be predicted to only see a 2% reduction in TST, and a 0.6% increase in perceived sleep disturbance). Time driving and step count were also minor predictors of SD and TST. Conclusion: At a population level, average activity measures from wearables and smartphones such as steps, smartphone usage time, sedentary activity etc. are limited predictors of objective sleep metrics such as Total Sleep Time, and subjective sleep metrics such as the PROMIS Sleep Disturbance t-score. Support (if any): This research was funded by Google Inc. View details
    Preview abstract In this article, we study the evolution of Android permissions. We describe the rationale behind key changes in Android’s permission model and disclose two permission-related security vulnerabilities we discovered. Finally, we provide developers actionable insights to proactively address permission-related security and privacy risks during development. View details
    Preview abstract New regulations and increased awareness of data privacy have led to the deployment of new and more efficient differentially private mechanisms across public institutions and industries. Ensuring the correctness of these mechanisms is therefore crucial to ensure the proper protection of data. However, since differential privacy is a property of the mechanism itself, and not of an individual output, testing whether a mechanism is differentially private is not a trivial task. While ad hoc testing techniques exist under specific assumptions, no concerted effort has been made by the research community to develop a flexible and extendable tool for testing differentially private mechanisms. This paper introduces DP-Auditorium as a step advancing research in this direction. DP-Auditorium abstracts the problem of testing differential privacy into two steps: (1) measuring the distance between distributions, and (2) finding neighboring datasets where a mechanism generates output distributions maximizing such distance. From a technical point of view, we propose three new algorithms for evaluating the distance between distributions. While these algorithms are well-established in the statistics community, we provide new estimation guarantees that exploit the fact that we are only interested in verifying whether a mechanism is differentially private, and not in obtaining an exact estimate of the distance between two distributions. DP-Auditorium is easily extensible, as demonstrated in this paper by implementing a well-known approximate differential privacy testing algorithm into our library. We provide an extensive comparison to date of multiple testers across varying sample sizes and differential privacy parameters, demonstrating that there is no single tester that dominates all others, and that a combination of different techniques is required to ensure proper testing of mechanisms. View details
    Enhancing Trust and Safety in Digital Payments: An LLM-Powered Approach
    Anant Modwal
    Govind Kaushal
    Ramanan Balakrishnan
    Shanay Shah
    Monu Agrawal
    Justin Lin
    Prakash Hariramani
    Priya Mandawat
    Rutvik Karve
    Naveen Madiraju
    Preview abstract Digital payment systems have revolutionized financial transactions, offering unparalleled convenience and accessibility to users worldwide. However, the increasing popularity of these platforms has also attracted malicious actors seeking to exploit their vulnerabilities for financial gain. To address this challenge, robust and adaptable scam detection mechanisms are crucial for maintaining the trust and safety of digital payment ecosystems. This paper presents a comprehensive approach to scam detection, focusing on the Unified Payments Interface (UPI) in India, Google Pay (GPay) as a specific use case. The approach leverages Large Language Models (LLMs) to enhance scam classification accuracy and designs a digital assistant to aid human reviewers in identifying and mitigating fraudulent activities. The results demonstrate the potential of LLMs in augmenting existing machine learning models and improving the efficiency, accuracy, quality, and consistency of scam reviews, ultimately contributing to a safer and more secure digital payment landscape. Our evaluation of the Gemini Ultra model on curated transaction data showed a 93.33% accuracy in scam classification. Furthermore, the model demonstrated 89% accuracy in generating reasoning for these classifications. A promising fact, the model identified 32% new accurate reasons for suspected scams that human reviewers had not included in the review notes. View details
    PROMPT: A Fast and Extensible Memory Profiling Framework
    Ziyang Xu
    Yebin Chon
    Yian Su
    Zujun Tan
    Simone Campanoni
    David I. August
    Proceedings of the ACM on Programming Languages, 8, Issue OOPSLA (2024)
    Preview abstract Memory profiling captures programs' dynamic memory behavior, assisting programmers in debugging, tuning, and enabling advanced compiler optimizations like speculation-based automatic parallelization. As each use case demands its unique program trace summary, various memory profiler types have been developed. Yet, designing practical memory profilers often requires extensive compiler expertise, adeptness in program optimization, and significant implementation effort. This often results in a void where aspirations for fast and robust profilers remain unfulfilled. To bridge this gap, this paper presents PROMPT, a framework for streamlined development of fast memory profilers. With PROMPT, developers need only specify profiling events and define the core profiling logic, bypassing the complexities of custom instrumentation and intricate memory profiling components and optimizations. Two state-of-the-art memory profilers were ported with PROMPT where all features preserved. By focusing on the core profiling logic, the code was reduced by more than 65% and the profiling overhead was improved by 5.3× and 7.1× respectively. To further underscore PROMPT's impact, a tailored memory profiling workflow was constructed for a sophisticated compiler optimization client. In 570 lines of code, this redesigned workflow satisfies the client’s memory profiling needs while achieving more than 90% reduction in profiling overhead and improved robustness compared to the original profilers. View details
    Automatic Speech Recognition of Conversational Speech in Individuals with Disordered Speech
    Bob MacDonald
    Rus Heywood
    Richard Cave
    Katie Seaver
    Antoine Desjardins
    Jordan Green
    Journal of Speech, Language, and Hearing Research (2024) (to appear)
    Preview abstract Purpose: This study examines the effectiveness of automatic speech recognition (ASR) for individuals with speech disorders, addressing the gap in performance between read and conversational ASR. We analyze the factors influencing this disparity and the effect of speech mode-specific training on ASR accuracy. Method: Recordings of read and conversational speech from 27 individuals with various speech disorders were analyzed using both (1) one speaker-independent ASR system trained and optimized for typical speech and (2) multiple ASR models that were personalized to the speech of the participants with disordered speech. Word Error Rates (WERs) were calculated for each speech mode, read vs conversational, and subject. Linear mixed-effect models were used to assess the impact of speech mode and disorder severity on ASR accuracy. We investigated nine variables, classified as technical, linguistic, or speech impairment factors, for their potential influence on the performance gap. Results: We found a significant performance gap between read and conversational speech in both personalized and unadapted ASR models. Speech impairment severity notably impacted recognition accuracy in unadapted models for both speech modes and in personalized models for read speech. Linguistic attributes of utterances were the most influential on accuracy, though atypical speech characteristics also played a role. Including conversational speech samples in model training notably improved recognition accuracy. Conclusions: We observed a significant performance gap in ASR accuracy between read and conversational speech for individuals with speech disorders. This gap was largely due to the linguistic complexity and unique characteristics of speech disorders in conversational speech. Training personalized ASR models using conversational speech significantly improved recognition accuracy, demonstrating the importance of domain-specific training and highlighting the need for further research into ASR systems capable of handling disordered conversational speech effectively. View details
    Preview abstract Examines how a Google R&D programme sought to accelerate a future of safer, cheaper and more ubiquitous fusion and other nuclear energy. Discusses how the programme was started, its major components: fusion, edge-of-technology, and policy advocacy supporting innovation. Shows successful exits for each part. Beyond telling the sotry, an intents is to show how to move the needle, and get people to think about how they might also help, and show Google has made a difference. Timing of publication marks the 10th anniversary of programme's start. View details
    Security & Privacy Product Inclusion
    Dave Kleidermacher
    Emmanuel Arriaga
    Eric Wang
    Sebastian Porst
    Roger Piqueras Jover
    Arxive (2024)
    Preview abstract In this paper, we explore the challenges of ensuring security and privacy for users from diverse demographic backgrounds. We propose a threat modeling approach to identify potential risks and countermeasures for product inclusion in security and privacy. We discuss various factors that can affect a user's ability to achieve a high level of security and privacy, including low-income demographics, poor connectivity, shared device usage, ML fairness, etc. We present results from a global security and privacy user experience survey and discuss the implications for product developers. Our work highlights the need for a more inclusive approach to security and privacy and provides a framework for researchers and practitioners to consider when designing products and services for a diverse range of users. View details
    Alignment of brain embeddings and artificial contextual embeddings in natural language points to common geometric patterns
    Ariel Goldstein
    Avigail Grinstein-Dabush
    Haocheng Wang
    Zhuoqiao Hong
    Bobbi Aubrey
    Samuel A. Nastase
    Zaid Zada
    Eric Ham
    Harshvardhan Gazula
    Eliav Buchnik
    Werner Doyle
    Sasha Devore
    Patricia Dugan
    Roi Reichart
    Daniel Friedman
    Orrin Devinsky
    Adeen Flinker
    Uri Hasson
    Nature Communications (2024)
    Preview abstract Contextual embeddings, derived from deep language models (DLMs), provide a continuous vectorial representation of language. This embedding space differs fundamentally from the symbolic representations posited by traditional psycholinguistics. We hypothesize that language areas in the human brain, similar to DLMs, rely on a continuous embedding space to represent language. To test this hypothesis, we densely record the neural activity patterns in the inferior frontal gyrus (IFG) of three participants using dense intracranial arrays while they listened to a 30-minute podcast. From these fine-grained spatiotemporal neural recordings, we derive a continuous vectorial representation for each word (i.e., a brain embedding) in each patient. We demonstrate that brain embeddings in the IFG and the DLM contextual embedding space have common geometric patterns using stringent zero-shot mapping. The common geometric patterns allow us to predict the brain embedding of a given left-out word in IFG based solely on its geometrical relationship to other nonoverlapping words in the podcast. Furthermore, we show that contextual embeddings better capture the geometry of IFG embeddings than static word embeddings. The continuous brain embedding space exposes a vector-based neural code for natural language processing in the human brain. View details
    Embedding-Aligned Language Models
    Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS-24), Vancouver (2024)
    Preview abstract We propose a novel approach for training large language models (LLMs) to adhere to objectives imposed by a latent embedding space. Our method leverages reinforcement learning (RL), treating a pre-trained LLM as an environment. An Embedding-Aligned Guided LanguagE (EAGLE) agent it trained using a significantly smaller language model to iteratively stir the LLM's generation towards optimal regions of a latent embedding space, given some predefined criteria. We demonstrate the effectiveness of the EAGLE agent using the MovieLens 25M dataset, on extrapolation tasks for content gap to satisfy latent user demand, and multi-attribute satisfaction for generating creative variations of entities. Our work paves the way for controlled and grounded text generation using LLMs, ensuring consistency with domain-specific knowledge and data representations. View details
    Guidelines for Productivity in Virtual Reality
    Andrea Colaco
    ACM Interactions, 31 (2024), pp. 46-53
    Preview abstract Most of our interactions with digital content currently occur inside 2D screens, however moving from that format to immersive setups brings a paradigm shift. From content inside the screen to users inside the content. This change requires a revisit to how we blend the analog and the digital and how we transfer content between the two modes. Perhaps it even asks for new guidelines too. While different solutions appear in the space, the dynamic range only seems to widen. We can start to see what works and what does not work so well, in an empirical or ethnographic approach, beyond laboratory studies. But if we want to accelerate adoption we need to further the understanding on how current tasks can be improved. How this new form of interaction can increase their productivity. In this paper we focus on analyzing and converging what we think works, and envisioning how this new set of immersive devices and interactions can enable productivity beyond already existing tools. View details
    SCOREQ: Speech Quality Assessment with Contrastive Regression
    Alessandro Ragano
    Andrew Hines
    NeurIPS (2024) (to appear)
    Preview abstract In this paper, we present SCOREQ, a novel approach for speech quality prediction. SCOREQ is a triplet loss function for contrastive regression that addresses the domain generalisation shortcoming exhibited by state of the art no-reference speech quality metrics. In the paper we: (i) illustrate the problem of L2 loss training failing at capturing the continuous nature of the mean opinion score (MOS) labels; (ii) demonstrate the lack of generalisation through a benchmarking evaluation across several speech domains; (iii) outline our approach and explore the impact of the architectural design decisions through incremental evaluation; (iv) evaluate the final model against state of the art models for a wide variety of data and domains. The results show that the lack of generalisation observed in state of the art speech quality metrics is addressed by SCOREQ. We conclude that using a triplet loss function View details
    Preview abstract We present StreamVC, a streaming voice conversion solution that preserves the content and prosody of any source speech while matching the voice timbre from any target speech. Unlike previous approaches, StreamVC produces the resulting waveform at low latency from the input signal even on a mobile platform, making it applicable to real-time communication scenarios like calls and video conferencing, and addressing use cases such as voice anonymization in these scenarios. Our design leverages the architecture and training strategy of the SoundStream neural audio codec for lightweight high-quality speech synthesis. We demonstrate the feasibility of learning soft speech units causally, as well as the effectiveness of supplying whitened fundamental frequency information to improve pitch stability without leaking the source timbre information. View details
    Quantifying urban park use in the USA at scale: empirical estimates of realised park usage using smartphone location data
    Michael T Young
    Swapnil Vispute
    Stylianos Serghiou
    Akim Kumok
    Yash Shah
    Kevin J. Lane
    Flannery Black-Ingersoll
    Paige Brochu
    Monica Bharel
    Sarah Skenazy
    Shailesh Bavadekar
    Mansi Kansal
    Evgeniy Gabrilovich
    Gregory A. Wellenius
    Lancet Planetary Health (2024)
    Preview abstract Summary Background A large body of evidence connects access to greenspace with substantial benefits to physical and mental health. In urban settings where access to greenspace can be limited, park access and use have been associated with higher levels of physical activity, improved physical health, and lower levels of markers of mental distress. Despite the potential health benefits of urban parks, little is known about how park usage varies across locations (between or within cities) or over time. Methods We estimated park usage among urban residents (identified as residents of urban census tracts) in 498 US cities from 2019 to 2021 from aggregated and anonymised opted-in smartphone location history data. We used descriptive statistics to quantify differences in park usage over time, between cities, and across census tracts within cities, and used generalised linear models to estimate the associations between park usage and census tract level descriptors. Findings In spring (March 1 to May 31) 2019, 18·9% of urban residents visited a park at least once per week, with average use higher in northwest and southwest USA, and lowest in the southeast. Park usage varied substantially both within and between cities; was unequally distributed across census tract-level markers of race, ethnicity, income, and social vulnerability; and was only moderately correlated with established markers of census tract greenspace. In spring 2019, a doubling of walking time to parks was associated with a 10·1% (95% CI 5·6–14·3) lower average weekly park usage, adjusting for city and social vulnerability index. The median decline in park usage from spring 2019 to spring 2020 was 38·0% (IQR 28·4–46·5), coincident with the onset of physical distancing policies across much of the country. We estimated that the COVID-19-related decline in park usage was more pronounced for those living further from a park and those living in areas of higher social vulnerability. Interpretation These estimates provide novel insights into the patterns and correlates of park use and could enable new studies of the health benefits of urban greenspace. In addition, the availability of an empirical park usage metric that varies over time could be a useful tool for assessing the effectiveness of policies intended to increase such activities. View details