Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10092 publications
See Through Vehicles: Fully Occluded Vehicle Detection with Millimeter Wave Radar
Chenming He
Chengzhen Meng
Chunwang He
Beibei Wang
Yubo Yan
Yanyong Zhang
MobiCom 2024: The 30th Annual International Conference On Mobile Computing And Networking
Preview abstract
A crucial task in autonomous driving is to continuously detect nearby vehicles. Problems thus arise when a vehicle is occluded and becomes “unseeable”, which may lead to accidents. In this study, we develop mmOVD, a system that can detect fully occluded vehicles by involving millimeter-wave radars to capture the ground-reflected signals passing beneath the blocking vehicle’s chassis. The foremost challenge here is coping with ghost points caused by frequent multi-path reflections, which highly resemble the true points. We devise a set of features that can efficiently distinguish the ghost points by exploiting the neighbor points’ spatial and velocity distributions. We also design a cumulative clustering algorithm to effectively aggregate the unstable ground reflected radar points over consecutive frames to derive the bounding boxes of the vehicles.
We have evaluated mmOVD in both controlled environments and real-world environments. In an underground garage and two campus roads, we conducted controlled experiments in 56 scenes with 8 vehicles, including a minibus and a motorcycle. Our system accurately detects occluded vehicles for the first time, with a 91.1% F1 score for occluded vehicle detection and a 100% success rate for occlusion event detection. More importantly, we drove 324km on crowded roads at a speed up to 70km per hour and show we could achieve an occlusion detection success rate of 92% and a low false alarm rate of 4% with only 10% of the training data in complex real-world environments.
View details
Preview abstract
Artificial Intelligence (AI) holds the promise of transforming healthcare by improving patient outcomes, increasing accessibility and efficiency, and decreasing the cost of care. Realizing this vision of a healthier world for everyone everywhere requires partnerships and trust between healthcare systems, clinicians, payers, technology companies, pharmaceutical companies, and governments to drive innovations in machine learning and artificial intelligence to patients. Google is one example of a technology company that is partnering with healthcare systems, clinicians, and researchers to develop technology solutions that will directly improve the lives of patients. In this chapter we share landmark trials of the use of AI in healthcare. We also describe the application of our novel system of organizing information to unify data in electronic health records (EHRs) and bring an integrated view of patient records to clinicians. We discuss our consumer focused innovation in dermatology to help guide search journeys for personalized information about skin conditions. Finally, we share a perspective on how to embed ethics and a concern for all patients into the development of AI.
View details
Score Distillation Sampling with Learned Manifold Corrective
European Conference on Computer Vision (ECCV) (2024)
Preview abstract
Score Distillation Sampling (SDS) is a recent but already widely popular method that relies on an image diffusion model to control optimization problems using text prompts. In this paper, we conduct an in-depth analysis of the SDS loss function, identify an inherent problem with its formulation, and propose a surprisingly easy but effective fix. Specifically, we decompose the loss into different factors and isolate the component responsible for noisy gradients. In the original formulation, high text guidance is used to account for the noise, leading to unwanted side effects such as oversaturation or repeated detail. Instead, we train a shallow network mimicking the timestep-dependent frequency bias of the image diffusion model in order to effectively factor it out. We demonstrate the versatility and the effectiveness of our novel loss formulation through qualitative and quantitative experiments, including optimization-based image synthesis and editing, zero-shot image translation network training, and text-to-3D synthesis.
View details
Federated Variational Inference: Towards Improved Personalization and Generalization
Elahe Vedadi
Josh Dillon
Philip Mansfield
Karan Singhal
Arash Afkanpour
Warren Morningstar
AAAI Federated Learning on the Edge Symposium (2024)
Preview abstract
Conventional federated learning algorithms train a single global model by leveraging all participating clients' data. However, due to heterogeneity in client generative distributions and predictive models, these approaches may not appropriately approximate the predictive process, converge to an optimal state, or generalize to new clients. We study personalization and generalization in stateless cross-device federated learning setups assuming heterogeneity in client data distributions and predictive models. We first propose a hierarchical generative model and formalize it using Bayesian Inference. We then approximate this process using Variational Inference to train our model efficiently. We call this algorithm Federated Variational Inference (FedVI). We use PAC-Bayes analysis to provide generalization bounds for FedVI. We evaluate our model on FEMNIST and CIFAR-100 image classification and show that FedVI beats the state-of-the-art on both tasks.
View details
Teach Better or Show Smarter? On Instructions and Exemplars in Automatic Prompt Optimization
Advances in Neural Information Processing Systems (NeurIPS) (2024) (to appear)
Preview abstract
Large language models have demonstrated remarkable capabilities, but their performance is heavily reliant on effective prompt engineering. Automatic prompt optimization (APO) methods are designed to automate this and can be broadly categorized into those targeting instructions (instruction optimization, IO) vs. those targeting exemplars (exemplar selection, ES). Despite their shared objective, these have evolved rather independently, with IO recently receiving more research attention. This paper seeks to bridge this gap by comprehensively comparing the performance of representative IO and ES techniques, both isolation and combination, on a diverse set of challenging tasks. Our findings reveal that intelligently reusing model-generated input-output pairs obtained from evaluating prompts on the validation set as exemplars consistently improves performance over IO methods but is currently under-investigated. We also find that despite the recent focus on IO, how we select exemplars can outweigh how we optimize instructions, with ES strategies as simple as random search outperforming state-of-the-art IO methods with seed instructions without any optimization. Moreover, we observe synergy between ES and IO, with optimal combinations surpassing individual contributions. We conclude that studying exemplar selection as a standalone method and its optimal combination with instruction optimization remains a crucial aspect of APO and deserves greater consideration in future research, even in the era of highly capable instruction-following models.
View details
AGILE3D: Attention Guided Interactive Multi-object 3D Segmentation
Yuanwen Yue
Sabarinath Mahadevan
Jonas Schult
Francis Engelmann
Bastian Leibe
Konrad Schindler
Theodora Kontogianni
ICLR (2024)
Preview abstract
During interactive segmentation, a model and a user work together to delineate objects of interest in a 3D point cloud. In an iterative process, the model assigns each data point to an object (or the background), while the user corrects errors in the resulting segmentation and feeds them back into the model. The current best practice formulates the problem as binary classification and segments objects one at a time. The model expects the user to provide positive clicks to indicate regions wrongly assigned to the background and negative clicks on regions wrongly assigned to the object. Sequentially visiting objects is wasteful since it disregards synergies between objects: a positive click for a given object can, by definition, serve as a negative click for nearby objects. Moreover, a direct competition between adjacent objects can speed up the identification of their common boundary. We introduce AGILE3D, an efficient, attention-based model that (1) supports simultaneous segmentation of multiple 3D objects, (2) yields more accurate segmentation masks with fewer user clicks, and (3) offers faster inference. Our core idea is to encode user clicks as spatial-temporal queries and enable explicit interactions between click queries as well as between them and the 3D scene through a click attention module. Every time new clicks are added, we only need to run a lightweight decoder that produces updated segmentation masks. In experiments with four different 3D point cloud datasets, AGILE3D sets a new state-of-the-art. Moreover, we also verify its practicality in real-world setups with real user studies. Project page: https://ywyue.github.io/AGILE3D.
View details
On the Robustness of Image-based Malware Detection against Adversarial Attacks
Yassine Mekdad
Harun Oz
Ahmet Aris
Leonardo Babun
Faraz Naseem
Selcuk Uluagac
Nasir Ghani
Abbas Acar
Network Security Empowered by Artificial Intelligence, Springer (2024)
Preview abstract
Machine and deep learning models are now one of the most valuable tools in the arsenal of computer security practitioners. Their success has been demonstrated in various network-security-oriented applications such as intrusion detection, cyber threat intelligence, vulnerability discovery, and malware detection. Nevertheless, recent research studies have shown that crafted adversarial samples can be used to evade malware detection models. Even though several defense mechanisms such as adversarial training have been proposed in the malware detection domain to address this issue, they unfortunately suffer from model poisoning and low detection accuracy. In this chapter, we assess the robustness of image-based malware classifier against four different adversarial attacks: (a) random and benign brute-force byte append attacks for black-box settings and (b) random and benign Fast Gradient Sign Method (FGSM) attacks for white-box settings. To this end, we implement a Convolutional Neural Network (CNN) to classify the image representations of Windows Portable Executable (PE) malware with a detection accuracy of 95.05%. Then, we evaluate its robustness along with MalConv, a state-of-the-art malware classifier, by applying a set of functionality-preserving adversarial attacks. Our experimental results demonstrate that image-based classifier exhibits a lower evasion rate of 5% compared to MalConv that achieves an evasion rate ranging between 44 and 54% in black-box settings. However, in white-box settings, both models fail against random byte and benign byte FGSM attacks, with an evasion rate of more than 46%.
View details
PROMPT: A Fast and Extensible Memory Profiling Framework
Ziyang Xu
Yebin Chon
Yian Su
Zujun Tan
Simone Campanoni
David I. August
Proceedings of the ACM on Programming Languages, 8, Issue OOPSLA (2024)
Preview abstract
Memory profiling captures programs' dynamic memory behavior, assisting programmers in debugging, tuning, and enabling advanced compiler optimizations like speculation-based automatic parallelization. As each use case demands its unique program trace summary, various memory profiler types have been developed. Yet, designing practical memory profilers often requires extensive compiler expertise, adeptness in program optimization, and significant implementation effort. This often results in a void where aspirations for fast and robust profilers remain unfulfilled. To bridge this gap, this paper presents PROMPT, a framework for streamlined development of fast memory profilers. With PROMPT, developers need only specify profiling events and define the core profiling logic, bypassing the complexities of custom instrumentation and intricate memory profiling components and optimizations. Two state-of-the-art memory profilers were ported with PROMPT where all features preserved. By focusing on the core profiling logic, the code was reduced by more than 65% and the profiling overhead was improved by 5.3× and 7.1× respectively. To further underscore PROMPT's impact, a tailored memory profiling workflow was constructed for a sophisticated compiler optimization client. In 570 lines of code, this redesigned workflow satisfies the client’s memory profiling needs while achieving more than 90% reduction in profiling overhead and improved robustness compared to the original profilers.
View details
Preview abstract
We propose OmniNOCS, a large-scale monocular dataset with 3D Normalized Object Coordinate Space (NOCS) maps, object masks, and 3D bounding box annotations for indoor and outdoor scenes. OmniNOCS has 20 times more object classes and 200 times more instances than existing NOCS datasets (NOCS-Real275, Wild6D). We use OmniNOCS to train a novel, transformer-based monocular NOCS prediction model (NOCSformer) that can predict accurate NOCS, instance masks and poses from 2D object detections across diverse classes. It is the first NOCS model that can generalize to a broad range of classes when prompted with 2D boxes. We evaluate our model on the task of 3D oriented bounding box prediction, where it achieves comparable results to state-of-the-art 3D detection methods such as Cube R-CNN. Unlike other 3D detection methods, our model also provides detailed and accurate 3D object shape and segmentation. We propose a novel benchmark for the task of NOCS prediction based on OmniNOCS, which we hope will serve as a useful baseline for future work in this area. Our dataset and code is available at the project website: https://omninocs.github.io
View details
Preview abstract
Vortex is an exabyte scale structured storage system built for streaming and batch analytics. It supports high-throughput batch and stream ingestion. For the user, it supports both batch-oriented and stream-based processing on the ingested data.
View details
Prompt Cache: Modular Attention Reuse for Low Latency Inference
Anurag Khandelwal
Guojun Chen
In Gim
Lin Zhong
Seung-seob Lee
Νikhil Sarda
MLSys (2024)
Preview abstract
We present Prompt Cache, an approach for accelerating inference for large language models (LLM) by reusing
attention states across different LLM prompts. Many input prompts have overlapping text segments, such as
system messages, prompt templates, and documents provided for context. Our key insight is that by precomputing
and storing the attention states of these frequently occurring text segments on the inference server, we can
efficiently reuse them when these segments appear in user prompts. Prompt Cache employs a schema to explicitly
define such reusable text segments, called prompt modules. The schema ensures positional accuracy during
attention state reuse and provides users with an interface to access cached states in their prompt. Using a prototype
implementation, we evaluate Prompt Cache across several LLMs. We show that Prompt Cache significantly reduce
latency in time-to-first-token, especially for longer prompts such as document-based question answering and
recommendations. The improvements range from 8× for GPU-based inference to 60× for CPU-based inference,
all while maintaining output accuracy and without the need for model parameter modifications.
View details
Preview abstract
Interruptions in digital services are a common occurrence for users. These disruptions, however, exact a cost in terms of attention, task completion rate, and, most importantly, emotional state. While several methods currently employed by service providers attempt to address this, the paper will argue that browser games or similar interactive interfaces should become a standard mechanism to ease the aforementioned effects.
View details
Preview abstract
With the increase in the number of privacy regulations, small development teams are forced to make privacy decisions on their own. In this paper, we conduct a mixed-method survey study, including statistical and qualitative analysis, to evaluate the privacy perceptions, practices, and knowledge of members involved in various phases of the Software Development Life Cycle (SDLC). Our survey includes 362 participants from 23 countries, encompassing roles such as product managers, developers, and testers. Our results show diverse definitions of privacy across SDLC roles, emphasizing the need for a holistic privacy approach throughout SDLC. We find that software teams, regardless of their region, are less familiar with privacy concepts (such as anonymization), relying on self-teaching and forums. Most participants are more familiar with GDPR and HIPAA than other regulations, with multi-jurisdictional compliance being their primary concern. Our results advocate the need for role-dependent solutions to address the privacy challenges, and we highlight research directions and educational takeaways to help improve privacy-aware SDLC.
View details
Discovering Datasets on the Web Scale: Challenges and Recommendations for Google Dataset Search
Daniel Russell
Stella Dugall
Harvard Data Science Review (2024)
Preview abstract
With the rise of open data in the last two decades, more datasets are online and more people are using them for projects and research. But how do people find datasets? We present the first user study of Google Dataset Search, a dataset-discovery tool that uses a web crawl and open ecosystem to find datasets. Google Dataset Search contains a superset of the datasets in other dataset-discovery tools—a total of 45 million datasets from 13,000 sources. We found that the tool addresses a previously identified need: a search engine for datasets across the entire web, including datasets in other tools. However, the tool introduced new challenges due to its open approach: building a mental model of the tool, making sense of heterogeneous datasets, and learning how to search for datasets. We discuss recommendations for dataset-discovery tools and open research questions.
View details
Preview abstract
Sequence labeling is a core task in text understanding for IE/IR systems. Text generation models have increasingly become the go-to solution for such tasks (e.g., entity extraction and dialog slot filling). While most research has focused on the labeling accuracy, a key aspect -- of vital practical importance -- has slipped through the cracks: understanding model confidence. More specifically, we lack a principled understanding of how to reliably gauge the confidence of a model in its predictions for each labeled span. This paper aims to provide some empirical insights on estimating model confidence for generative sequence labeling. Most notably, we find that simply using the decoder's output probabilities is not the best in realizing well-calibrated confidence estimates. As verified over six public datasets of different tasks, we show that our proposed approach -- which leverages statistics from top-k predictions by a beam search -- significantly reduces calibration errors of the predictions of a generative sequence labeling model.
View details