Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 10132 publications
    See Through Vehicles: Fully Occluded Vehicle Detection with Millimeter Wave Radar
    Chenming He
    Chengzhen Meng
    Chunwang He
    Beibei Wang
    Yubo Yan
    Yanyong Zhang
    MobiCom 2024: The 30th Annual International Conference On Mobile Computing And Networking
    Preview abstract A crucial task in autonomous driving is to continuously detect nearby vehicles. Problems thus arise when a vehicle is occluded and becomes “unseeable”, which may lead to accidents. In this study, we develop mmOVD, a system that can detect fully occluded vehicles by involving millimeter-wave radars to capture the ground-reflected signals passing beneath the blocking vehicle’s chassis. The foremost challenge here is coping with ghost points caused by frequent multi-path reflections, which highly resemble the true points. We devise a set of features that can efficiently distinguish the ghost points by exploiting the neighbor points’ spatial and velocity distributions. We also design a cumulative clustering algorithm to effectively aggregate the unstable ground reflected radar points over consecutive frames to derive the bounding boxes of the vehicles. We have evaluated mmOVD in both controlled environments and real-world environments. In an underground garage and two campus roads, we conducted controlled experiments in 56 scenes with 8 vehicles, including a minibus and a motorcycle. Our system accurately detects occluded vehicles for the first time, with a 91.1% F1 score for occluded vehicle detection and a 100% success rate for occlusion event detection. More importantly, we drove 324km on crowded roads at a speed up to 70km per hour and show we could achieve an occlusion detection success rate of 92% and a low false alarm rate of 4% with only 10% of the training data in complex real-world environments. View details
    Preview abstract In this paper we study users' opinions about the privacy of their mobile health apps. We look at what they write in app reviews in the 'Health & Fitness' category on the Google Play store. We identified 2832 apps in this category (based on 1K minimum installs). Using NLP/LLM analyses, we find that 76% of these apps have at least some privacy reviews. In total this yields over 164,000 reviews about privacy, from over 150 countries and in 25 languages. Our analyses identifies top themes and offers an approximation of how widespread these issues are around the world. We show that the top 4 themes - Data Sharing and Exposure, Permission Requests, Location Tracking and Data Collection - are issues of concern in over 70 countries. Our automatically generated thematic summaries reveal interesting aspects that deserve further research around user suspicions (unneeded data collection), user requests (more fine-grained control over data collection and data access), as well as user behavior (uninstalling apps). View details
    Prompt-Based Label-Aware Framework for Few-Shot Multi-Label Text Classification
    Thanakorn Thaminkaew
    Peerapon Vateekul
    IEEE Access, 12 (2024), pp. 28310-28322
    Preview abstract Prompt-based learning has demonstrated remarkable success in few-shot text classification, outperforming the traditional fine-tuning approach. This method transforms a text input into a masked language modeling prompt using a template, queries a fine-tuned language model to fill in the mask, and then uses a verbalizer to map the model’s output to a predicted class. Previous prompt-based text classification approaches were primarily designed for multi-class classification, taking advantage of the fact that the classes are mutually exclusive and one example belongs to only one class. However, these assumptions do not hold in the context of multi-label text classification, where labels often exhibit correlations with each other. Therefore, we propose a Prompt-based Label-Aware framework for Multi-Label text classification (PLAML) that addresses the challenges. Specifically, PLAML enhances prompt-based learning with three proposed techniques to improve the overall performance for multi-label classification. The techniques include (i) a token weighting algorithm that considers the correlations between labels, (ii) a template for augmenting training samples, making the training process label-aware, and (iii) a dynamic threshold mechanism, refining the prediction condition of each label. Extensive experiments on few-shot text classification across multiple datasets with various languages show that our PLAML outperforms other baseline methods. We also analyzed the effect of each proposed technique to better understand how it is suitable for the multi-label setting. View details
    Preview abstract We propose a neural network model that can separate target speech sources from interfering sources at different angular regions using two microphones. The model is trained with simulated room impulse responses (RIRs) using omni-directional microphones without needing to collect real RIRs. By relying on specific angular regions and multiple room simulations, the model utilizes consistent time difference of arrival (TDOA) cues, or what we call delay contrast, to separate target and interference sources while remaining robust in various reverberation environments. We demonstrate the model is not only generalizable to a commercially available device with a slightly different microphone geometry, but also outperforms our previous work which uses one additional microphone on the same device. The model runs in real-time on-device and is suitable for low-latency streaming applications such as telephony and video conferencing. View details
    Preview abstract This workshop discusses the cutting-edge developments in research and applications of personalizing large language models (LLMs) and adapting them to the demands of diverse user populations and societal needs. The full-day workshop plan includes several keynotes and invited talks, a poster session and a panel discussion. View details
    Assistive AI in Lung Cancer Screening: A Retrospective Multinational Study in the United States and Japan
    Atilla Kiraly
    Corbin Cunningham
    Ryan Najafi
    Jie Yang
    Chuck Lau
    Diego Ardila
    Scott Mayer McKinney
    Rory Pilgrim
    Mozziyar Etemadi
    Sunny Jansen
    Lily Peng
    Shravya Shetty
    Neeral Beladia
    Krish Eswaran
    Radiology: Artificial Intelligence (2024)
    Preview abstract Lung cancer is the leading cause of cancer death world-wide with 1.8 million deaths in 20201. Studies have concluded that low-dose computed tomography lung cancer screening can reduce mortality by up to 61%2 and updated 2021 US guidelines expanded eligibility. As screening efforts rise, AI can play an important role, but must be unobtrusively integrated into existing clinical workflows. In this work, we introduce a state-of-the-art, cloud-based AI system providing lung cancer risk assessments without requiring any user input. We demonstrate its efficacy in assisting lung cancer screening under both US and Japanese screening settings using different patient populations and screening protocols. Technical improvements over a previously described system include a focus on earlier cancer detection for improved accuracy, introduction of an effective assistive user interface, and a system designed to integrate into typical clinical workflows. The stand-alone AI system was evaluated on 3085 individuals achieving area under the curve (AUC) scores of 91.7% (95%CI [89.6, 95.2]), 93.3% (95%CI [90.2, 95.7]), and 89.1% (95%CI [77.7, 97.3]) on three datasets (two from US and one from Japan), respectively. To evaluate the system’s assistive ability, we conducted two retrospective multi-reader multi-case studies on 627 cases read by experienced board certified radiologists (average 20 years of experience [7,40]) using local PACS systems in the respective US and Japanese screening settings. The studies measured the reader’s level of suspicion (LoS) and categorical responses for scores and management recommendations under country-specific screening protocols. The radiologists’ AUC for LoS increased with AI assistance by 2.3% (95%CI [0.1-4.5], p=0.022) for the US study and by 2.3% (95%CI [-3.5-8.1], p=0.179) for the Japan study. Specificity for recalls increased by 5.5% (95%CI [2.7-8.5], p<0.0001) for the US and 6.7% (95%CI [4.7-8.7], p<0.0001) for the Japan study. No significant reduction in other metrics occured. This work advances the state-of-the-art in lung cancer detection, introduces generalizable interface concepts that can be applicable to similar AI applications, and demonstrates its potential impact on diagnostic AI in global lung cancer screening with results suggesting a substantial drop in unnecessary follow-up procedures without impacting sensitivity. View details
    Preview abstract This paper reflects on work at Google over the past decade to address common types of software safety and security defects. Our experience has shown that software safety is an emergent property of the software and tooling ecosystem it is developed in and the production environment into which it is deployed. Thus, to effectively prevent common weaknesses at scale, we need to shift-left the responsibility for ensuring safety and security invariants to the end-to-end developer ecosystem, that is, programming languages, software libraries, application frameworks, build and deployment tooling, the production platform and its configuration surfaces, and so forth. Doing so is practical and cost effective when developer ecosystems are designed with application archetypes in mind, such as web or mobile apps: The design of the developer ecosystem can address threat model aspects that apply commonly to all applications of the respective archetype, and investments to ensure safety invariants at the ecosystem level amortize across many applications. Applying secure-by-design principles to developer ecosystems at Google has achieved drastic reduction and in some cases near-zero residual rates of common classes of defects, across hundreds of applications being developed by thousands of developers. View details
    Preview abstract Slow concept drift is a ubiquitous, yet under-studied problem in practical machine learning systems. Although recent data is more indicative of future data in these settings, naively prioritizing these instances runs the risk of losing valuable information from the past. We propose an optimization-driven approach towards balancing instance importance over large training windows. First, we model instance relevance using a mixture of multiple timescales of decay, allowing us to capture rich temporal trends. Second, we learn an auxiliary \textit{scorer model} that recovers the appropriate mixture of timescales as a function of the instance itself. Finally, we propose a nested optimization objective for learning the scorer, by which it maximizes forward transfer for the learned model. Experiments on a large real-world dataset of 39M photos over a 9 year period show upto 15\% relative gains in accuracy compared to other robust learning baselines. We replicate our gains on two collections of real-world datasets for non-stationary learning, and extend our work to continual learning settings where, too, we beat SOTA methods by large margins. View details
    Preview abstract Modern code review is a process in which incremental code contributions made by one software developer are reviewed by one or more peers before it is committed to the version control system. An important element of modern code review is verifying that the code under review adheres to style guidelines and best practices of the corresponding programming language. Some of these rules are universal and can be checked automatically or enforced via code formatters. Other rules, however, are context-dependent and the corresponding checks are commonly left to developers who are experts in the given programming language and whose time is expensive. Many automated systems have been developed that attempt to detect various rule violations without any human intervention. Historically, such systems implement targeted analyses and were themselves expensive to develop. This paper presents AutoCommenter, a system that uses a state of the art large language model to automatically learn and enforce programming language best practices. We implemented AutoCommenter for four programming languages: C++, Java, Python and Go. We evaluated its performance and adoption in a large industrial setting. Our evaluation shows that a model that automatically learns language best practices is feasible and has a measurable positive impact on the developer workflow. Additionally, we present the challenges we faced when deploying such a model to tens of thousands of developers and provide lessons we learned for any practitioners that would like to replicate the work or build on top of it. View details
    Preview abstract Task-oriented queries (e.g., one-shot queries to play videos, order food, or call a taxi) are crucial for assessing the quality of virtual assistants, chatbots, and other large language model (LLM)-based services. However, a standard benchmark for task-oriented queries is not yet available, as existing benchmarks in the relevant NLP (Natural Language Processing) fields have primarily focused on task-oriented dialogues. Thus, we present a new methodology for efficiently generating the Task-oriented Queries Benchmark (ToQB) using existing task-oriented dialogue datasets and an LLM service. Our methodology involves formulating the underlying NLP task to summarize the original intent of a speaker in each dialogue, detailing the key steps to perform the devised NLP task using an LLM service, and outlining a framework for automating a major part of the benchmark generation process. Through a case study encompassing three domains (i.e., two single-task domains and one multi-task domain), we demonstrate how to customize the LLM prompts (e.g., omitting system utterances or speaker labels) for those three domains and characterize the generated task-oriented queries. The generated ToQB dataset is made available to the public.We further discuss new domains that can be added to ToQB by community contributors and its practical applications. View details
    Preview abstract Learned reweighting (LRW) approaches to supervised learning use an optimization criterion to assign weights for training instances, in order to maximize performance on a representative validation dataset. We pose and formalize the problem of optimized selection of the validation set used in LRW training, to improve classifier generalization. In particular, we show that using hard-to-classify instances in the validation set has both a theoretical connection to, and strong empirical evidence of generalization. We provide an efficient algorithm for training this meta-optimized model, as well as a simple train-twice heuristic for careful comparative study. We demonstrate that LRW with easy validation data performs consistently worse than LRW with hard validation data, establishing the validity of our meta-optimization problem. Our proposed algorithm outperforms a wide range of baselines on a range of datasets and domain shift challenges (Imagenet-1K, CIFAR-100, Clothing-1M, CAMELYON, WILDS, etc.), with ~1% gains using VIT-B on Imagenet. We also show that using naturally hard examples for validation (Imagenet-R / Imagenet-A) in LRW training for Imagenet improves performance on both clean and naturally hard test instances by 1-2%. Secondary analyses show that using hard validation data in an LRW framework improves margins on test data, hinting at the mechanism underlying our empirical gains. We believe this work opens up new research directions for the meta-optimization of meta-learning in a supervised learning context. View details
    Optimization by Decoded Quantum Interferometry
    Stephen Jordan
    Noah Shutty
    Mary Wootters
    Alexander Schmidhuber
    Robbie King
    Sergei Isakov
    arXiv:2408.08292 (2024)
    Preview abstract We introduce Decoded Quantum Interferometry (DQI), a quantum algorithm for reducing classical optimization problems to classical decoding problems by exploiting structure in the Fourier spectrum of the objective function. DQI reduces sparse max-XORSAT to decoding LDPC codes, which can be decoded using powerful classical algorithms such as belief propagation. As an initial benchmark, we compare DQI using belief propagation decoding against classical optimization via simulated annealing. In this setting we identify a family of max-XORSAT instances where DQI achieves a better approximation ratio on average than simulated annealing, although not better than specialized classical algorithms tailored to those instances. We also analyze a combinatorial optimization problem corresponding to finding polynomials that intersect the maximum number of points. There, DQI efficiently achieves a better approximation ratio than any polynomial-time classical algorithm known to us, thus realizing an apparent exponential quantum speedup. Finally, we show that the problem defined by Yamakawa and Zhandry in order to prove an exponential separation between quantum and classical query complexity is a special case of the optimization problem efficiently solved by DQI. View details
    Quantifying urban park use in the USA at scale: empirical estimates of realised park usage using smartphone location data
    Michael T Young
    Swapnil Vispute
    Stylianos Serghiou
    Akim Kumok
    Yash Shah
    Kevin J. Lane
    Flannery Black-Ingersoll
    Paige Brochu
    Monica Bharel
    Sarah Skenazy
    Shailesh Bavadekar
    Mansi Kansal
    Evgeniy Gabrilovich
    Gregory A. Wellenius
    Lancet Planetary Health (2024)
    Preview abstract Summary Background A large body of evidence connects access to greenspace with substantial benefits to physical and mental health. In urban settings where access to greenspace can be limited, park access and use have been associated with higher levels of physical activity, improved physical health, and lower levels of markers of mental distress. Despite the potential health benefits of urban parks, little is known about how park usage varies across locations (between or within cities) or over time. Methods We estimated park usage among urban residents (identified as residents of urban census tracts) in 498 US cities from 2019 to 2021 from aggregated and anonymised opted-in smartphone location history data. We used descriptive statistics to quantify differences in park usage over time, between cities, and across census tracts within cities, and used generalised linear models to estimate the associations between park usage and census tract level descriptors. Findings In spring (March 1 to May 31) 2019, 18·9% of urban residents visited a park at least once per week, with average use higher in northwest and southwest USA, and lowest in the southeast. Park usage varied substantially both within and between cities; was unequally distributed across census tract-level markers of race, ethnicity, income, and social vulnerability; and was only moderately correlated with established markers of census tract greenspace. In spring 2019, a doubling of walking time to parks was associated with a 10·1% (95% CI 5·6–14·3) lower average weekly park usage, adjusting for city and social vulnerability index. The median decline in park usage from spring 2019 to spring 2020 was 38·0% (IQR 28·4–46·5), coincident with the onset of physical distancing policies across much of the country. We estimated that the COVID-19-related decline in park usage was more pronounced for those living further from a park and those living in areas of higher social vulnerability. Interpretation These estimates provide novel insights into the patterns and correlates of park use and could enable new studies of the health benefits of urban greenspace. In addition, the availability of an empirical park usage metric that varies over time could be a useful tool for assessing the effectiveness of policies intended to increase such activities. View details
    Distributed Tracing for InterPlanetary File System
    Marshall David Miller
    Rachel Han
    Haorui Guo
    2024 International Symposium on Parallel Computing and Distributed Systems (PCDS), IEEE, pp. 1-5
    Preview abstract The InterPlanetary File System (IPFS) is on its way to becoming the backbone of the next generation of the web. However, it suffers from several performance bottlenecks, particularly on the content retrieval path, which are often difficult to debug. This is because content retrieval involves multiple peers on the decentralized network and the issue could lie anywhere in the network. Traditional debugging tools are insufficient to help web developers who face the challenge of slow loading websites and detrimental user experience. This limits the adoption and future scalability of IPFS. In this paper, we aim to gain valuable insights into how content retrieval requests propagate within the IPFS network as well as identify potential performance bottlenecks which could lead to opportunities for improvement. We propose a custom tracing framework that generates and manages traces for crucial events that take place on each peer during content retrieval. The framework leverages event semantics to build a timeline of each protocol involved in the retrieval, helping developers pinpoint problems. Additionally, it is resilient to malicious behaviors of the peers in the decentralized environment. We have implemented this framework on top of an existing IPFS implementation written in Java called Nabu. Our evaluation shows that the framework can identify network delays and issues with each peer involved in content retrieval requests at a very low overhead. View details
    Preview abstract Large language models have demonstrated remarkable capabilities, but their performance is heavily reliant on effective prompt engineering. Automatic prompt optimization (APO) methods are designed to automate this and can be broadly categorized into those targeting instructions (instruction optimization, IO) vs. those targeting exemplars (exemplar selection, ES). Despite their shared objective, these have evolved rather independently, with IO recently receiving more research attention. This paper seeks to bridge this gap by comprehensively comparing the performance of representative IO and ES techniques, both isolation and combination, on a diverse set of challenging tasks. Our findings reveal that intelligently reusing model-generated input-output pairs obtained from evaluating prompts on the validation set as exemplars consistently improves performance over IO methods but is currently under-investigated. We also find that despite the recent focus on IO, how we select exemplars can outweigh how we optimize instructions, with ES strategies as simple as random search outperforming state-of-the-art IO methods with seed instructions without any optimization. Moreover, we observe synergy between ES and IO, with optimal combinations surpassing individual contributions. We conclude that studying exemplar selection as a standalone method and its optimal combination with instruction optimization remains a crucial aspect of APO and deserves greater consideration in future research, even in the era of highly capable instruction-following models. View details