Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10133 publications
Preview abstract
Cloud computing architectures are more scalable and economical which is the main reason that has contributed to its popularity. However, they bring their own set of challenges when it comes to workload scheduling and resource utilization because virtual machines (VM) and applications have to share different types of resources like servers, storage, etc. Historically, other strategies for workload balancing and resource management include manual configuration or simplistic heuristics that do not provide effective optimizations of resource usage and performance. In this technical brief, we propose an approach built on the use of unsupervised learning techniques to detect usage patterns perceptively and improve resource utilization, which corresponds to both optimal performance and automatically balanced workload among VMs. We are making use of clustering algorithms to cluster similar workloads and then resource allocation for each group based on demand. The point of this step is to use the resources more effectively so we do not run into resource exhaustion. We also integrate anomaly detection methods within our system for identifying and handling abnormal behavior by both monitoring and placing resources. We experiment with region traces from production workloads to demonstrate the benefits of our approach, showing marked improvements in workload balancing and resource utilization over current practices.
View details
A Versatile Diffusion Transformer with Mixture of Noise Levels for Audiovisual Generation
Bradley Kim
Alonso Martinez
Yu-Chuan Su
Agrim Gupta
Lu Jiang
Jacob Walker
Neural Information Processing Systems (NeurIPS) (2024) (to appear)
Preview abstract
Training diffusion models for audiovisual sequences allows for a range of generation tasks by learning conditional distributions of various input-output combinations of the two modalities. Nevertheless, this strategy often requires training a separate model for each task which is expensive. Here, we propose a novel training approach to effectively learn arbitrary conditional distributions in the audiovisual space. Our key contribution lies in how we parameterize the diffusion timestep in the forward diffusion process. Instead of the standard fixed diffusion timestep, we propose applying variable diffusion timesteps across the temporal dimension and across modalities of the inputs. This formulation offers flexibility to introduce variable noise levels for various portions of the input, hence the term mixture of noise levels. We propose a transformer-based audiovisual latent diffusion model and show that it can be trained in a task-agnostic fashion using our approach to enable a variety of audiovisual generation tasks at inference time. Experiments demonstrate the versatility of our method in tackling cross-modal and multimodal interpolation tasks in the audiovisual space. Notably, our proposed approach surpasses baselines in generating temporally and perceptually consistent samples conditioned on the input.
View details
Quantifying urban park use in the USA at scale: empirical estimates of realised park usage using smartphone location data
Michael T Young
Swapnil Vispute
Stylianos Serghiou
Akim Kumok
Yash Shah
Kevin J. Lane
Flannery Black-Ingersoll
Paige Brochu
Monica Bharel
Sarah Skenazy
Shailesh Bavadekar
Mansi Kansal
Evgeniy Gabrilovich
Gregory A. Wellenius
Lancet Planetary Health (2024)
Preview abstract
Summary
Background A large body of evidence connects access to greenspace with substantial benefits to physical and mental
health. In urban settings where access to greenspace can be limited, park access and use have been associated with
higher levels of physical activity, improved physical health, and lower levels of markers of mental distress. Despite the
potential health benefits of urban parks, little is known about how park usage varies across locations (between or
within cities) or over time.
Methods We estimated park usage among urban residents (identified as residents of urban census tracts) in
498 US cities from 2019 to 2021 from aggregated and anonymised opted-in smartphone location history data. We
used descriptive statistics to quantify differences in park usage over time, between cities, and across census tracts
within cities, and used generalised linear models to estimate the associations between park usage and census tract
level descriptors.
Findings In spring (March 1 to May 31) 2019, 18·9% of urban residents visited a park at least once per week, with
average use higher in northwest and southwest USA, and lowest in the southeast. Park usage varied substantially
both within and between cities; was unequally distributed across census tract-level markers of race, ethnicity, income,
and social vulnerability; and was only moderately correlated with established markers of census tract greenspace. In
spring 2019, a doubling of walking time to parks was associated with a 10·1% (95% CI 5·6–14·3) lower average
weekly park usage, adjusting for city and social vulnerability index. The median decline in park usage from spring
2019 to spring 2020 was 38·0% (IQR 28·4–46·5), coincident with the onset of physical distancing policies across
much of the country. We estimated that the COVID-19-related decline in park usage was more pronounced for those
living further from a park and those living in areas of higher social vulnerability.
Interpretation These estimates provide novel insights into the patterns and correlates of park use and could enable
new studies of the health benefits of urban greenspace. In addition, the availability of an empirical park usage metric
that varies over time could be a useful tool for assessing the effectiveness of policies intended to increase such
activities.
View details
UGIF-DataSet: A New Dataset for Cross-lingual, Cross-modal Sequential actions on the UI
Findings of the Association for Computational Linguistics: NAACL 2024
Preview abstract
Help documents are supposed to aid smartphone users in resolving queries such as "How to block calls from unknown numbers?". However, given a query, identifying the right help document, understanding instructions from the document, and using them to resolve the issue at hand is challenging. The user experience may be enhanced by converting the instructions in the help document to a step-by-step tutorial overlaid on the phone UI. Successful execution of this task requires overcoming research challenges in retrieval, parsing, and grounding in the multilingual-multimodal setting. For example, user queries in one language may have to be matched against instructions in another language, which in turn needs to be grounded in a multimodal UI in yet another language. Moreover, there isn’t any relevant dataset for such a task. In order to bridge this gap, we introduce UGIF-DataSet, a multi-lingual, multi-modal UI grounded dataset for step-by-step task completion on the smartphone, containing 4,184 tasks across 8 languages. The instruction steps in UGIF-DataSet are available only in English, so the challenge involves operations in the cross-modal, cross-lingual setting. We compare the performance of different large language models for this task and find that the end-to-end task completion rate drops from 48% in English to 32% for other languages, demonstrating significant overall headroom for improvement. We are hopeful that UGIF-DataSet and our analysis will aid further research on the important problem of sequential task completion in the multilingual and multimodal setting.
View details
Android Permissions: Evolution, Attacks, and Best Practices
IEEE Security and Privacy (2024) (to appear)
Preview abstract
In this article, we study the evolution of
Android permissions. We describe the rationale behind key changes in Android’s
permission model and disclose two permission-related security vulnerabilities
we discovered. Lastly, we provide developers actionable insights to proactively
address permission-related security and privacy risks during development.
View details
Unveiling Privacy Perspectives about Mobile Health Apps on a Large Scale
PETS workshop: Privacy, Safety and Trust for Mobile Health Apps (2024)
Preview abstract
In this paper we study users' opinions about the privacy of their mobile health apps. We look at what they write in app reviews in the 'Health & Fitness' category on the Google Play store. We identified 2832 apps in this category (based on 1K minimum installs). Using NLP/LLM analyses, we find that 76% of these apps have at least some privacy reviews. In total this yields over 164,000 reviews about privacy, from over 150 countries and in 25 languages. Our analyses identifies top themes and offers an approximation of how widespread these issues are around the world. We show that the top 4 themes - Data Sharing and Exposure, Permission Requests, Location Tracking and Data Collection - are issues of concern in over 70 countries. Our automatically generated thematic summaries reveal interesting aspects that deserve further research around user suspicions (unneeded data collection), user requests (more fine-grained control over data collection and data access), as well as user behavior (uninstalling apps).
View details
Relational Affect in Dyadic Interactions
CHI Conference on Human Factors in Computing Systems (2024)
Preview abstract
Relational affect is the affective response (encompassing emotion, expression, feeling) that emerges from an interaction between two people. The case study presented here introduces the concept of relational affect through a human perceptual rating task. Forty-five raters watched short video clips of two people interacting and described their perceived emotion of the individuals and that of the overall interaction. Our qualitative analysis of the rater responses showed that raters used a variety of schemes to reason about emotion, including expressions, context, and perceived appraisal of the event. These reasoning schemes were notably different for perceived individual emotion and relational affect. Our findings show that the vocabulary use for relational affect is distinct from that of individual emotion and relational affect as a phenomenon deepens our understanding of social interactions and moves the field a step closer to realizing the goal of fluid interactions between people and technology.
View details
Website Data Transparency in the Browser
Sebastian Zimmeck
Daniel Goldelman
Owen Kaplan
Logan Brown
Justin Casler
Judeley Jean-Charles
Joe Champeau
24th Privacy Enhancing Technologies Symposium (PETS 2024), PETS (to appear)
Preview abstract
Data collection by websites and their integrated third parties is often not transparent. We design privacy interfaces for the browser to help people understand who is collecting which data from them. In a proof of concept browser extension, Privacy Pioneer, we implement a privacy popup, a privacy history interface, and a watchlist to notify people when their data is collected. For detecting location data collection, we develop a machine learning model based on TinyBERT, which reaches an average F1 score of 0.94. We supplement our model with deterministic methods to detect trackers, collection of personal data, and other monetization techniques. In a usability study with 100 participants 82% found Privacy Pioneer easy to understand and 90% found it useful indicating the value of privacy interfaces directly integrated in the browser.
View details
Generalized Power Attacks against Crypto Hardware using Long-Range Deep Learning
Karel Král
Marina Zhang
Transactions on Cryptographic Hardware and Embedded Systems (TCHES), IACR (2024)
Preview abstract
To make cryptographic processors more resilient against side-channel attacks, engineers have developed various countermeasures. However, the effectiveness of these countermeasures is often uncertain, as it depends on the complex interplay between software and hardware. Assessing a countermeasure’s effectiveness using profiling techniques or machine learning so far requires significant expertise and effort to be adapted to new targets which makes those assessments expensive. We argue that including cost-effective automated attacks will help chip design teams to quickly evaluate their countermeasures during the development phase, paving the way to more secure chips.In this paper, we lay the foundations toward such automated system by proposing GPAM, the first deep-learning system for power side-channel analysis that generalizes across multiple cryptographic algorithms, implementations, and side-channel countermeasures without the need for manual tuning or trace preprocessing. We demonstrate GPAM’s capability by successfully attacking four hardened hardware-accelerated elliptic-curve digital-signature implementations. We showcase GPAM’s ability to generalize across multiple algorithms by attacking a protected AES implementation and achieving comparable performance to state-of-the-art attacks, but without manual trace curation and within a limited budget. We release our data and models as an open-source contribution to allow the community to independently replicate our results and build on them.
View details
Analyzing Prospects for Quantum Advantage in Topological Data Analysis
Dominic W. Berry
Yuan Su
Casper Gyurik
Robbie King
Joao Basso
Abhishek Rajput
Nathan Wiebe
Vedran Djunko
PRX Quantum, 5 (2024), pp. 010319
Preview abstract
Lloyd et al. were first to demonstrate the promise of quantum algorithms for computing Betti numbers in persistent homology (a way of characterizing topological features of data sets). Here, we propose, analyze, and optimize an improved quantum algorithm for topological data analysis (TDA) with reduced scaling, including a method for preparing Dicke states based on inequality testing, a more efficient amplitude estimation algorithm using Kaiser windows, and an optimal implementation of eigenvalue projectors based on Chebyshev polynomials. We compile our approach to a fault-tolerant gate set and estimate constant factors in the Toffoli complexity. Our analysis reveals that super-quadratic quantum speedups are only possible for this problem when targeting a multiplicative error approximation and the Betti number grows asymptotically. Further, we propose a dequantization of the quantum TDA algorithm that shows that having exponentially large dimension and Betti number are necessary, but insufficient conditions, for super-polynomial advantage. We then introduce and analyze specific problem examples for which super-polynomial advantages may be achieved, and argue that quantum circuits with tens of billions of Toffoli gates can solve some seemingly classically intractable instances.
View details
Assistive AI in Lung Cancer Screening: A Retrospective Multinational Study in the United States and Japan
Atilla Kiraly
Corbin Cunningham
Ryan Najafi
Jie Yang
Chuck Lau
Diego Ardila
Scott Mayer McKinney
Rory Pilgrim
Mozziyar Etemadi
Sunny Jansen
Lily Peng
Shravya Shetty
Neeral Beladia
Krish Eswaran
Radiology: Artificial Intelligence (2024)
Preview abstract
Lung cancer is the leading cause of cancer death world-wide with 1.8 million deaths in 20201. Studies have concluded that low-dose computed tomography lung cancer screening can reduce mortality by up to 61%2 and updated 2021 US guidelines expanded eligibility. As screening efforts rise, AI can play an important role, but must be unobtrusively integrated into existing clinical workflows. In this work, we introduce a state-of-the-art, cloud-based AI system providing lung cancer risk assessments without requiring any user input. We demonstrate its efficacy in assisting lung cancer screening under both US and Japanese screening settings using different patient populations and screening protocols. Technical improvements over a previously described system include a focus on earlier cancer detection for improved accuracy, introduction of an effective assistive user interface, and a system designed to integrate into typical clinical workflows. The stand-alone AI system was evaluated on 3085 individuals achieving area under the curve (AUC) scores of 91.7% (95%CI [89.6, 95.2]), 93.3% (95%CI [90.2, 95.7]), and 89.1% (95%CI [77.7, 97.3]) on three datasets (two from US and one from Japan), respectively. To evaluate the system’s assistive ability, we conducted two retrospective multi-reader multi-case studies on 627 cases read by experienced board certified radiologists (average 20 years of experience [7,40]) using local PACS systems in the respective US and Japanese screening settings. The studies measured the reader’s level of suspicion (LoS) and categorical responses for scores and management recommendations under country-specific screening protocols. The radiologists’ AUC for LoS increased with AI assistance by 2.3% (95%CI [0.1-4.5], p=0.022) for the US study and by 2.3% (95%CI [-3.5-8.1], p=0.179) for the Japan study. Specificity for recalls increased by 5.5% (95%CI [2.7-8.5], p<0.0001) for the US and 6.7% (95%CI [4.7-8.7], p<0.0001) for the Japan study. No significant reduction in other metrics occured. This work advances the state-of-the-art in lung cancer detection, introduces generalizable interface concepts that can be applicable to similar AI applications, and demonstrates its potential impact on diagnostic AI in global lung cancer screening with results suggesting a substantial drop in unnecessary follow-up procedures without impacting sensitivity.
View details
Developer Ecosystems for Software Safety
Commun. ACM, 67(6) (2024), 52–60
Preview abstract
This paper reflects on work at Google over the past decade to address common types of software safety and security defects. Our experience has shown that software safety is an emergent property of the software and tooling ecosystem it is developed in and the production environment into which it is deployed. Thus, to effectively prevent common weaknesses at scale, we need to shift-left the responsibility for ensuring safety and security invariants to the end-to-end developer ecosystem, that is, programming languages, software libraries, application frameworks, build and deployment tooling, the production platform and its configuration surfaces, and so forth.
Doing so is practical and cost effective when developer ecosystems are designed with application archetypes in mind, such as web or mobile apps: The design of the developer ecosystem can address threat model aspects that apply commonly to all applications of the respective archetype, and investments to ensure safety invariants at the ecosystem level amortize across many applications.
Applying secure-by-design principles to developer ecosystems at Google has achieved drastic reduction and in some cases near-zero residual rates of common classes of defects, across hundreds of applications being developed by thousands of developers.
View details
Preview abstract
Inter-sentence pauses are the silences that occur between sentences in a paragraph or a dialogue.
They are an important aspect of long-form speech prosody, as they can affect the naturalness, intelligibility, and effectiveness of communication.
However, the user perception of inter-sentence pauses in long-form speech synthesis is not well understood. Previous work often evaluates pause modelling in conjunction with other prosodic features making it hard to explicitly study how raters perceive differences in inter-sentence pause lengths.
In this paper, using multiple text-to-speech (TTS) datasets that cover different content types, domains, and settings, we investigate how sensitive raters are to changes to the durations of inter-sentence pauses in long-form speech by comparing ground truth audio samples with renditions that have manipulated pause durations.
This experimental design is meant to allow us to draw conclusions regarding the utility that can be expected from similar evaluations when applied to synthesized long-form speech.
We find that, using standard evaluation methodologies, raters are not sensitive to variations in pause lengths unless these deviate exceedingly from the norms or expectations of the speech context.
View details
Making Images from Images: Tightly Constrained Parallel Denoising
Ashwin Baluja
European Conference on Computer Vision, AI for Visual Arts Workshop and Challenges (2024)
Preview abstract
We present methods to transform an image into a novel one of any subject matter simply by rearranging the image’s tiles. Our method extends and improves recent work in the generation of optical illusions by discovering the optimal arrangement of the image’s tiles simultaneously with the image generation. In addition to producing images that more accurately represent the subject matter, this technique allows us to address a much broader class of problems than previously possible. By learning the image transforms, we allow any source image to be pre- specified; any existing image (e.g. the Mona Lisa) can be transformed to a novel subject. We formulate this as a tightly constrained optimization problem and address it through alternating the steps of image diffusion and energy minimization using optimal matching. Under our formulation, a simple method to extend this to infinite copies of the source image is also given. Unlike previous methods, as the number of tiles grows the problem becomes easier and the results become better.
View details
PikeLPN: Mitigating Overlooked Inefficiencies of Low-Precision Neural Networks
Marina Neseem
Conor McCullough
Randy Hsin
Chas Leichner
Shan Li
In Suk Chong
Andrew Howard
Lukasz Lew
Sherief Reda
Ville-Mikko Rautio
Daniele Moro
Conference on Computer Vision and Pattern Recognition (2024) (to appear)
Preview abstract
Low-precision quantization is recognized for its efficacy in neural network optimization. Our analysis reveals that non-quantized elementwise operations which are prevalent in layers such as parameterized activation functions, batch normalization, and quantization scaling dominate the inference cost of low-precision models. These non-quantized elementwise operations are commonly overlooked in SOTA efficiency metrics such as Arithmetic Computation Effort (ACE). In this paper, we propose ACEv2 - an extended version of ACE which offers a better alignment with the inference cost of quantized models and their energy consumption on ML hardware. Moreover, we introduce PikeLPN, a model that addresses these efficiency issues by applying quantization to both elementwise operations and multiply-accumulate operations. In particular, we present a novel quantization technique for batch normalization layers named QuantNorm which allows for quantizing the batch normalization parameters without compromising the model performance. Additionally, we propose applying Double Quantization where the quantization scaling parameters are quantized. Furthermore, we recognize and resolve the issue of distribution mismatch in Separable Convolution layers by introducing Distribution-Heterogeneous Quantization which enables quantizing them to low-precision. PikeLPN achieves Pareto-optimality in efficiency-accuracy trade-off with up to 3X efficiency improvement compared to SOTA low-precision models.
View details