Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 10132 publications
    Preview abstract Zero-shot text rankers powered by recent LLMs achieve remarkable ranking performance by simply prompting. Existing prompts for pointwise LLM rankers mostly ask the model to choose from binary relevance labels like "Yes" and "No". However, the lack of intermediate relevance label options may cause the LLM to provide noisy or biased answers for documents that are partially relevant to the query. We propose to incorporate fine-grained relevance labels into the prompt for LLM rankers, enabling them to better differentiate among documents with different levels of relevance to the query and thus derive a more accurate ranking. We study two variants of the prompt template, coupled with different numbers of relevance levels. Our experiments on 8 BEIR data sets show that adding fine-grained relevance labels significantly improves the performance of LLM rankers. View details
    On the Robustness of Image-based Malware Detection against Adversarial Attacks
    Yassine Mekdad
    Harun Oz
    Ahmet Aris
    Leonardo Babun
    Faraz Naseem
    Selcuk Uluagac
    Nasir Ghani
    Abbas Acar
    Network Security Empowered by Artificial Intelligence, Springer (2024)
    Preview abstract Machine and deep learning models are now one of the most valuable tools in the arsenal of computer security practitioners. Their success has been demonstrated in various network-security-oriented applications such as intrusion detection, cyber threat intelligence, vulnerability discovery, and malware detection. Nevertheless, recent research studies have shown that crafted adversarial samples can be used to evade malware detection models. Even though several defense mechanisms such as adversarial training have been proposed in the malware detection domain to address this issue, they unfortunately suffer from model poisoning and low detection accuracy. In this chapter, we assess the robustness of image-based malware classifier against four different adversarial attacks: (a) random and benign brute-force byte append attacks for black-box settings and (b) random and benign Fast Gradient Sign Method (FGSM) attacks for white-box settings. To this end, we implement a Convolutional Neural Network (CNN) to classify the image representations of Windows Portable Executable (PE) malware with a detection accuracy of 95.05%. Then, we evaluate its robustness along with MalConv, a state-of-the-art malware classifier, by applying a set of functionality-preserving adversarial attacks. Our experimental results demonstrate that image-based classifier exhibits a lower evasion rate of 5% compared to MalConv that achieves an evasion rate ranging between 44 and 54% in black-box settings. However, in white-box settings, both models fail against random byte and benign byte FGSM attacks, with an evasion rate of more than 46%. View details
    Preview abstract We explore the boundaries of scaling up a multilingual vision and language model, both in terms of size of the components and the breadth of its training task mixture. Our model achieves new levels of performance on a wide-range of varied and complex tasks, including multiple image-based captioning and question-answering tasks, image-based document understanding and few-shot (in-context) learning, as well as object detection, video question answering, and video captioning. Our model advances the state-of-the-art on most vision-and-language benchmarks considered (20+ of them). Finally, we observe emerging capabilities, such as complex counting and multilingual object detection, tasks that are not explicitly in the training mix. View details
    Preview abstract We present StreamVC, a streaming voice conversion solution that preserves the content and prosody of any source speech while matching the voice timbre from any target speech. Unlike previous approaches, StreamVC produces the resulting waveform at low latency from the input signal even on a mobile platform, making it applicable to real-time communication scenarios like calls and video conferencing, and addressing use cases such as voice anonymization in these scenarios. Our design leverages the architecture and training strategy of the SoundStream neural audio codec for lightweight high-quality speech synthesis. We demonstrate the feasibility of learning soft speech units causally, as well as the effectiveness of supplying whitened fundamental frequency information to improve pitch stability without leaking the source timbre information. View details
    Sleep patterns and risk of chronic disease as measured by long-term monitoring with commercial wearable devices in the All of Us Research Program
    Neil S. Zheng
    Jeffrey Annis
    Hiral Master
    Lide Han
    Karla Gleichauf
    Melody Nasser
    Peyton Coleman
    Stacy Desine
    Douglas M. Ruderfer
    John Hernandez
    Logan D. Schneider
    Evan L. Brittain
    Nature Medicine (2024)
    Preview abstract Poor sleep health is associated with increased all-cause mortality and incidence of many chronic conditions. Previous studies have relied on cross-sectional and self-reported survey data or polysomnograms, which have limitations with respect to data granularity, sample size and longitudinal information. Here, using objectively measured, longitudinal sleep data from commercial wearable devices linked to electronic health record data from the All of Us Research Program, we show that sleep patterns, including sleep stages, duration and regularity, are associated with chronic disease incidence. Of the 6,785 participants included in this study, 71% were female, 84% self-identified as white and 71% had a college degree; the median age was 50.2 years (interquartile range = 35.7, 61.5) and the median sleep monitoring period was 4.5 years (2.5, 6.5). We found that rapid eye movement sleep and deep sleep were inversely associated with the odds of incident atrial fibrillation and that increased sleep irregularity was associated with increased odds of incident obesity, hyperlipidemia, hypertension, major depressive disorder and generalized anxiety disorder. Moreover, J-shaped associations were observed between average daily sleep duration and hypertension, major depressive disorder and generalized anxiety disorder. These findings show that sleep stages, duration and regularity are all important factors associated with chronic disease development and may inform evidence-based recommendations on healthy sleeping habits. View details
    Preview abstract Sequence labeling is a core task in text understanding for IE/IR systems. Text generation models have increasingly become the go-to solution for such tasks (e.g., entity extraction and dialog slot filling). While most research has focused on the labeling accuracy, a key aspect -- of vital practical importance -- has slipped through the cracks: understanding model confidence. More specifically, we lack a principled understanding of how to reliably gauge the confidence of a model in its predictions for each labeled span. This paper aims to provide some empirical insights on estimating model confidence for generative sequence labeling. Most notably, we find that simply using the decoder's output probabilities is not the best in realizing well-calibrated confidence estimates. As verified over six public datasets of different tasks, we show that our proposed approach -- which leverages statistics from top-k predictions by a beam search -- significantly reduces calibration errors of the predictions of a generative sequence labeling model. View details
    Sketching for Distributed Deep Learning: A Sharper Analysis
    Mayank Shrivastava
    Qiaobo Li
    Sanmi Koyejo
    Arindam Banerjee
    Conference on Neural Information Processing Systems (NeurIPS) (2024)
    Preview abstract The high communication cost between the server and the clients is a significant bottleneck in scaling distributed learning for overparametrized deep models. One popular approach for reducing this communication overhead is randomized sketching. However, existing theoretical analyses for sketching-based distributed learning (sketch-DL) either incur a prohibitive dependence on the ambient dimension or need additional restrictive assumptions such as heavy-hitters. Nevertheless, despite existing pessimistic analyses, empirical evidence suggests that sketch-DL is competitive with its uncompressed counterpart -- thus motivating a sharper analysis. In this work, we introduce a sharper ambient dimension-independent convergence analysis for sketch-DL using the second-order geometry specified by the loss Hessian. Our results imply ambient dimension-independent communication complexity for sketch-DL. We present empirical results both on the loss Hessian and overall accuracy of sketch-DL supporting our theoretical results. Taken together, our results provide theoretical justification for the observed empirical success of sketch-DL. View details
    With Great Power Comes Great Responsibility: Security and Privacy Issues of Modern Browser APIs
    Harun Oz
    Daniele Cono D’Elia
    Abbas Acar
    Riccardo Lazzeretti
    Selcuk Uluagac
    IEEE Security and Privacy (2024)
    Preview abstract This paper discusses security and privacy issues in modern Browser APIs by categorizing them based on their functionality. With this study, we aim to alert the community about these issues and motivate further research into analyzing the security and privacy concerns within modern Browser APIs. View details
    FaceFolds: Meshed Radiance Manifolds for Efficient Volumetric Rendering of Dynamic Faces
    Safa C. Medin
    Gengyan Li
    Stephan Garbin
    Philip Davidson
    Gregory W. Wornell
    Thabo Beeler
    Abhimitra Meka
    Proceedings of the ACM on Computer Graphics and Interactive Techniques, 7 (2024), pp. 1-17
    Preview abstract 3D rendering of dynamic face captures is a challenging problem, and it demands improvements on several fronts---photorealism, efficiency, compatibility, and configurability. We present a novel representation that enables high-quality volumetric rendering of an actor's dynamic facial performances with minimal compute and memory footprint. It runs natively on commodity graphics soft- and hardware, and allows for a graceful trade-off between quality and efficiency. Our method utilizes recent advances in neural rendering, particularly learning discrete radiance manifolds to sparsely sample the scene to model volumetric effects. We achieve efficient modeling by learning a single set of manifolds for the entire dynamic sequence, while implicitly modeling appearance changes as temporal canonical texture. We export a single layered mesh and view-independent RGBA texture video that is compatible with legacy graphics renderers without additional ML integration. We demonstrate our method by rendering dynamic face captures of real actors in a game engine, at comparable photorealism to state-of-the-art neural rendering techniques at previously unseen frame rates. View details
    Federated Variational Inference: Towards Improved Personalization and Generalization
    Elahe Vedadi
    Josh Dillon
    Philip Mansfield
    Karan Singhal
    Arash Afkanpour
    Warren Morningstar
    AAAI Federated Learning on the Edge Symposium (2024)
    Preview abstract Conventional federated learning algorithms train a single global model by leveraging all participating clients' data. However, due to heterogeneity in client generative distributions and predictive models, these approaches may not appropriately approximate the predictive process, converge to an optimal state, or generalize to new clients. We study personalization and generalization in stateless cross-device federated learning setups assuming heterogeneity in client data distributions and predictive models. We first propose a hierarchical generative model and formalize it using Bayesian Inference. We then approximate this process using Variational Inference to train our model efficiently. We call this algorithm Federated Variational Inference (FedVI). We use PAC-Bayes analysis to provide generalization bounds for FedVI. We evaluate our model on FEMNIST and CIFAR-100 image classification and show that FedVI beats the state-of-the-art on both tasks. View details
    Scaling Up LLM Reviews for Google Ads Content Moderation
    Ariel Fuxman
    Chih-Chun Chia
    Dongjin Kwon
    Enming Luo
    Mehmet Tek
    Ranjay Krishna
    Tiantian Fang
    Tushar Dogra
    Yu-Han Lyu
    (2024)
    Preview abstract Large language models (LLMs) are powerful tools for content moderation but LLM inference costs and latency on large volumes of data, such as the Google Ads repository, are prohibitive for their casual usage. This study is focused on scaling up LLM reviews for content moderation in Google Ads. First, we use heuristics to select candidates via filtering and duplicate removal, and create clusters of ads for which we select one representative ad per cluster. Then, LLMs are used to review only the representative ads. Finally we propagate the LLM decisions for representative ads back to their clusters. This method reduces the number of reviews by more than 3 orders of magnitude while achieving a 2x recall compared to a non-LLM model as a baseline. Note that, the success of this approach is a strong function of the representations used in clustering and label propagation; we observed that cross-modal similarity representations yield better results than uni-modal representations. View details
    SAC124 - SSAC Advice on Name Collision Analysis
    Internet Corporation for Assigned Names and Numbers (ICANN), ICANN Security and Stability Advisory Committee (SSAC) Reports and Advisories (2024), pp. 15
    Preview abstract In this document the Security and Stability Advisory Committee (SSAC) provides its analysis of the findings and recommendations presented within the Name Collision Analysis Project (NCAP) Study Two and the proposed Name Collision Risk Assessment Framework. The SSAC also provides additional commentary on several aspects of the NCAP Study Two Report and makes recommendations to the ICANN Board. View details
    Preview abstract Text-to-image diffusion models have demonstrated remarkable capabilities in transforming textual prompts into coherent images, yet the computational cost of their inference remains a persistent challenge. To address this issue, we present UFOGen, a novel generative model designed for ultra-fast, one-step text-to-image synthesis. In contrast to conventional approaches that focus on improving samplers or employing distillation techniques for diffusion models, UFOGen adopts a hybrid methodology, integrating diffusion models with a GAN objective. Leveraging a newly introduced diffusion-GAN objective and initialization with pre-trained diffusion models, UFOGen excels in efficiently generating high-quality images conditioned on textual descriptions in a single step. Beyond traditional text-to-image generation, UFOGen showcases versatility in applications. Notably, UFOGen stands among the pioneering models enabling one-step text-to-image generation and diverse downstream tasks, presenting a significant advancement in the landscape of efficient generative models. View details
    Rambler: Supporting Writing With Speech via LLM-Assisted Gist Manipulation
    Susan Lin
    Jeremy Warner
    J.D. Zamfirescu-Pereira
    Matthew G Lee
    Sauhard Jain
    Michael Xuelin Huang
    Bjoern Hartmann
    Can Liu
    Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, Association for Computing Machinery, New York, NY, USA
    Preview abstract Dictation enables efficient text input on mobile devices. However, writing with speech can produce disfluent, wordy, and incoherent text and thus requires heavy post-processing. This paper presents Rambler, an LLM-powered graphical user interface that supports gist-level manipulation of dictated text with two main sets of functions: gist extraction and macro revision. Gist extraction generates keywords and summaries as anchors to support the review and interaction with spoken text. LLM-assisted macro revisions allow users to respeak, split, merge, and transform dictated text without specifying precise editing locations. Together they pave the way for interactive dictation and revision that help close gaps between spontaneously spoken words and well-structured writing. In a comparative study with 12 participants performing verbal composition tasks, Rambler outperformed the baseline of a speech-to-text editor + ChatGPT, as it better facilitates iterative revisions with enhanced user control over the content while supporting surprisingly diverse user strategies. View details
    Preview abstract We're roughly 10 years into the OpenConfig journey. We have implementations in hand from various vendors, and we've gained significant operational experience in the domains of Streaming Telemetry and in Developing Configuration Systems to leverage the developed models. What have we learned? Are the abstractions we've generated the right ones? If not, why? Were we too influenced by the tools and inertia of the time when we made some critical decisions? How do we need to evolve going forward? This discussion is part retrospective/introspective, a candid look at where we've been and what we need to think about as we evolve the next generation of our management (and control) planes. What should we be thinking about as network engineers who write software? View details