Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 10132 publications
    ASTRA-5G: Automated Over-the-Air Security Testing and Research Architecture for 5G SA Devices
    Aanjhan Ranganathan
    Christina Pöpper
    Evangelos Bitsikas
    Michele Guerra
    Roger Piqueras Jover
    Syed Khandker
    WiSec '24: Proceedings of the 17th ACM Conference on Security and Privacy in Wireless and Mobile Networks, ACM (2024)
    Preview abstract Despite the widespread deployment of 5G technologies, there exists a critical gap in security testing for 5G Standalone (SA) devices. Existing methods, largely manual and labor-intensive, are ill-equipped to fully uncover the state of security in the implementations of 5G-SA protocols and standards on devices, severely limiting the ability to conduct comprehensive evaluations. To address this issue, in this work, we introduce an novel, open-source framework that auto- mates the security testing process for 5G SA devices. By leveraging enhanced functionalities of 5G SA core and Radio Access Network (RAN) software, our framework offers a streamlined approach to generating, executing, and evaluating test cases, specifically focusing on the Non-Access Stratum (NAS) layer. Our application of this framework across multiple 5G SA devices provides in-depth security insights, significantly improving testing efficiency and breadth. View details
    Preview abstract The effect of regularizers such as weight decay when training deep neural networks is not well understood. We study the influence of weight decay as well as $L2$-regularization when training neural network models in which parameter matrices interact multiplicatively. This combination is of particular interest as this parametrization is common in attention layers, the workhorse of transformers. Here, key-query, as well as value-projection parameter matrices, are multiplied directly with each other: $W_K^TW_Q$ and $PW_V$. We extend previous results and show on one hand that any local minimum of a $L2$-regularized loss of the form $L(AB^\top) + \lambda (\|A\|^2 + \|B\|^2)$ coincides with a minimum of the nuclear norm-regularized loss $L(AB^\top) + \lambda\|AB^\top\|_*$, and on the other hand that the 2 losses become identical exponentially quickly during training. We thus complement existing works linking $L2$-regularization with low-rank regularization, and in particular, explain why such regularization on the matrix product affects early stages of training. Based on these theoretical insights, we verify empirically that the key-query and value-projection matrix products $W_K^TW_Q, PW_V$ within attention layers, when optimized with weight decay, as usually done in vision tasks and language modelling, indeed induce a significant reduction in the rank of $W_K^TW_Q$ and $PW_V$, even in fully online training. We find that, in accordance with existing work, inducing low rank in attention matrix products can damage language model performance, and observe advantages when decoupling weight decay in attention layers from the rest of the parameters. View details
    Preview abstract The area of security measurability is gaining increased attention, with a wide range of organizations calling for the development of scalable approaches for assessing the security of software systems and infrastructure. In this paper, we present our experience developing Security Signals, a comprehensive system providing security measurability for web services, deployed in a complex application ecosystem of thousands of web services handling traffic from billions of users. The system collects security-relevant information from production HTTP traffic at the reverse proxy layer, utilizing novel concepts such as synthetic signals augmented with additional risk information to provide a holistic view of the security posture of individual services and the broader application ecosystem. This approach to measurability has enabled large-scale security improvements to our services, including allowing prioritized rollouts of security enhancements and the implementation of automated regression monitoring; it has proven valuable for security research and prioritization of defensive work. Security Signals addresses shortcomings of prior web measurability proposals by tracking a comprehensive set of security properties relevant to web applications, and by extracting insights from collected data for use by both security experts and non-experts. We believe the lessons learned from the implementation and use of Security Signals offer valuable insights for practitioners responsible for web service security, potentially inspiring new approaches to web security measurability. View details
    Preview abstract Google services are powered by the largest network of computers in the world. Site Reliabity Engineers (SRE) make sure that the whole stack is cool: datacenters are safe, well provisionedl; we have fallback mechanims, and data integrity; to making sure we design our stack properly, using the right storage, replication and software trade-offs. Generative AI is a great tool to make us super-effective: having access to tools to generate our most toily configurations, to classify risks and events, to manage large swaths of machines with agents or to automate complex workflows cheaply. This talk will cover the journey that SRE started years ago to become a truly AI-First discipline and the latest advancements in tooling, practices and workflows. View details
    Preview abstract Learned reweighting (LRW) approaches to supervised learning use an optimization criterion to assign weights for training instances, in order to maximize performance on a representative validation dataset. We pose and formalize the problem of optimized selection of the validation set used in LRW training, to improve classifier generalization. In particular, we show that using hard-to-classify instances in the validation set has both a theoretical connection to, and strong empirical evidence of generalization. We provide an efficient algorithm for training this meta-optimized model, as well as a simple train-twice heuristic for careful comparative study. We demonstrate that LRW with easy validation data performs consistently worse than LRW with hard validation data, establishing the validity of our meta-optimization problem. Our proposed algorithm outperforms a wide range of baselines on a range of datasets and domain shift challenges (Imagenet-1K, CIFAR-100, Clothing-1M, CAMELYON, WILDS, etc.), with ~1% gains using VIT-B on Imagenet. We also show that using naturally hard examples for validation (Imagenet-R / Imagenet-A) in LRW training for Imagenet improves performance on both clean and naturally hard test instances by 1-2%. Secondary analyses show that using hard validation data in an LRW framework improves margins on test data, hinting at the mechanism underlying our empirical gains. We believe this work opens up new research directions for the meta-optimization of meta-learning in a supervised learning context. View details
    Minimizing Live Experiments in Recommender Systems: User Simulation to Evaluate Preference Elicitation Policies
    Martin Mladenov
    James Pine
    Hubert Pham
    Shane Li
    Xujian Liang
    Anton Polishko
    Li Yang
    Ben Scheetz
    Proceedings of he 47th International ACM/SIGIR Conference on Research and Development in Information Retrieval (SIGIR-24), Washington, DC (2024), pp. 2925-2929
    Preview abstract Evaluation of policies in recommender systems (RSs) typically involves A/B testing using live experiments on real users to assess a new policy's impact on relevant metrics. This ``gold standard'' comes at a high cost, however, in terms of cycle time, user cost, and potential user retention. In developing policies for onboarding new users, these costs can be especially problematic, since on-boarding occurs only once. In this work, we describe a simulation methodology used to augment (and reduce) the use of live experiments. We illustrate its deployment for the evaluation of preference elicitation algorithms used to onboard new users of the YouTube Music platform. By developing counterfactually robust user behavior models, and a simulation service that couples such models with production infrastructure, we are able to test new algorithms in a way that reliably predicts their performance on key metrics when deployed live, sometimes more reliably than live experiments due to the scale at which simulation can be realized. We describe our domain, our simulation models and platform, results of experiments and deployment, and suggest future steps needed to further realistic simulation as a powerful complement to live experiments. View details
    PikeLPN: Mitigating Overlooked Inefficiencies of Low-Precision Neural Networks
    Marina Neseem
    Conor McCullough
    Randy Hsin
    Chas Leichner
    Shan Li
    In Suk Chong
    Andrew Howard
    Lukasz Lew
    Sherief Reda
    Ville-Mikko Rautio
    Daniele Moro
    Conference on Computer Vision and Pattern Recognition (2024) (to appear)
    Preview abstract Low-precision quantization is recognized for its efficacy in neural network optimization. Our analysis reveals that non-quantized elementwise operations which are prevalent in layers such as parameterized activation functions, batch normalization, and quantization scaling dominate the inference cost of low-precision models. These non-quantized elementwise operations are commonly overlooked in SOTA efficiency metrics such as Arithmetic Computation Effort (ACE). In this paper, we propose ACEv2 - an extended version of ACE which offers a better alignment with the inference cost of quantized models and their energy consumption on ML hardware. Moreover, we introduce PikeLPN, a model that addresses these efficiency issues by applying quantization to both elementwise operations and multiply-accumulate operations. In particular, we present a novel quantization technique for batch normalization layers named QuantNorm which allows for quantizing the batch normalization parameters without compromising the model performance. Additionally, we propose applying Double Quantization where the quantization scaling parameters are quantized. Furthermore, we recognize and resolve the issue of distribution mismatch in Separable Convolution layers by introducing Distribution-Heterogeneous Quantization which enables quantizing them to low-precision. PikeLPN achieves Pareto-optimality in efficiency-accuracy trade-off with up to 3X efficiency improvement compared to SOTA low-precision models. View details
    Quartic Quantum Speedups for Planted Inference Problems
    Alexander Schmidhuber
    Ryan O'Donnell
    arXiv:2406.19378 (2024)
    Preview abstract We describe a quantum algorithm for the Planted Noisy kXOR problem (also known as sparse Learning Parity with Noise) that achieves a nearly quartic (4th power) speedup over the best known classical algorithm while also only using logarithmically many qubits. Our work generalizes and simplifies prior work of Hastings, by building on his quantum algorithm for the Tensor Principal Component Analysis (PCA) problem. We achieve our quantum speedup using a general framework based on the Kikuchi Method (recovering the quartic speedup for Tensor PCA), and we anticipate it will yield similar speedups for further planted inference problems. These speedups rely on the fact that planted inference problems naturally instantiate the Guided Sparse Hamiltonian problem. Since the Planted Noisy kXOR problem has been used as a component of certain cryptographic constructions, our work suggests that some of these are susceptible to super-quadratic quantum attacks. View details
    RewriteLM: An Instruction-Tuned Large LanguageModel for Text Rewriting
    Yun Zhu
    Simon Tong
    Lei Meng
    Proceedings of the AAAI Conference on Artificial Intelligence, 38(17), 18970-18980 (2024)
    Preview abstract In recent years, Large Language Models (LLMs) have demonstrated impressive zero-shot capabilities in text generation tasks expressed through natural language instructions. However, text rewriting is a challenging task, and unintended modifications can negatively impact the system's performance. To address this challenge, we introduce a novel benchmark for text rewriting that covers a wide variety of rewriting types expressed through natural language instructions. Unlike previous benchmarks, which were primarily focused on limited rewrite styles and sentence-level rewriting, our benchmark is specifically designed to facilitate open-ended rewriting of long-form text. Additionally, we present a strong baseline model, RewriteLM, which is an instruction-tuned large language model for text rewriting. The model is trained using supervised fine-tuning, reward training, and reinforcement learning. To minimize human intervention in the data collection process, we develop new data generation strategies: (1) utilizing high-quality, long-form edits from Wikipedia as our primary natural training data source, (2) generating a synthetic dataset that includes diverse edit types and non-Wiki domains using chain-of-thoughts and the capabilities of LLMs, and (3) employing human-designed heuristic rankers to generate preference data. Our experiments demonstrate the effectiveness of our proposed benchmark and baseline model, as well as the benefits of our data collection strategies in minimizing human intervention. View details
    Preview abstract With the increase in the number of privacy regulations, small development teams are forced to make privacy decisions on their own. In this paper, we conduct a mixed-method survey study, including statistical and qualitative analysis, to evaluate the privacy perceptions, practices, and knowledge of members involved in various phases of the Software Development Life Cycle (SDLC). Our survey includes 362 participants from 23 countries, encompassing roles such as product managers, developers, and testers. Our results show diverse definitions of privacy across SDLC roles, emphasizing the need for a holistic privacy approach throughout SDLC. We find that software teams, regardless of their region, are less familiar with privacy concepts (such as anonymization), relying on self-teaching and forums. Most participants are more familiar with GDPR and HIPAA than other regulations, with multi-jurisdictional compliance being their primary concern. Our results advocate the need for role-dependent solutions to address the privacy challenges, and we highlight research directions and educational takeaways to help improve privacy-aware SDLC. View details
    Preview abstract Relational affect is the affective response (encompassing emotion, expression, feeling) that emerges from an interaction between two people. The case study presented here introduces the concept of relational affect through a human perceptual rating task. Forty-five raters watched short video clips of two people interacting and described their perceived emotion of the individuals and that of the overall interaction. Our qualitative analysis of the rater responses showed that raters used a variety of schemes to reason about emotion, including expressions, context, and perceived appraisal of the event. These reasoning schemes were notably different for perceived individual emotion and relational affect. Our findings show that the vocabulary use for relational affect is distinct from that of individual emotion and relational affect as a phenomenon deepens our understanding of social interactions and moves the field a step closer to realizing the goal of fluid interactions between people and technology. View details
    Prompt-Based Label-Aware Framework for Few-Shot Multi-Label Text Classification
    Thanakorn Thaminkaew
    Peerapon Vateekul
    IEEE Access, 12 (2024), pp. 28310-28322
    Preview abstract Prompt-based learning has demonstrated remarkable success in few-shot text classification, outperforming the traditional fine-tuning approach. This method transforms a text input into a masked language modeling prompt using a template, queries a fine-tuned language model to fill in the mask, and then uses a verbalizer to map the model’s output to a predicted class. Previous prompt-based text classification approaches were primarily designed for multi-class classification, taking advantage of the fact that the classes are mutually exclusive and one example belongs to only one class. However, these assumptions do not hold in the context of multi-label text classification, where labels often exhibit correlations with each other. Therefore, we propose a Prompt-based Label-Aware framework for Multi-Label text classification (PLAML) that addresses the challenges. Specifically, PLAML enhances prompt-based learning with three proposed techniques to improve the overall performance for multi-label classification. The techniques include (i) a token weighting algorithm that considers the correlations between labels, (ii) a template for augmenting training samples, making the training process label-aware, and (iii) a dynamic threshold mechanism, refining the prediction condition of each label. Extensive experiments on few-shot text classification across multiple datasets with various languages show that our PLAML outperforms other baseline methods. We also analyzed the effect of each proposed technique to better understand how it is suitable for the multi-label setting. View details
    Preview abstract Interruptions in digital services are a common occurrence for users. These disruptions, however, exact a cost in terms of attention, task completion rate, and, most importantly, emotional state. While several methods currently employed by service providers attempt to address this, the paper will argue that browser games or similar interactive interfaces should become a standard mechanism to ease the aforementioned effects. View details
    Preview abstract Millions of people turn to Google Search each day for information on things as diverse as new cars or flu symptoms. The terms that they enter contain valuable information on their daily intent and activities, but the information in these search terms has been difficult to fully leverage. User-defined categorical filters have been the most common way to shrink the dimensionality of search data to a tractable size for analysis and modeling. In this paper we present a new approach to reducing the dimensionality of search data while retaining much of the information in the individual terms without user-defined rules. Our contributions are two-fold: 1) we introduce SLaM Compression, a way to quantify search terms using pre-trained language models and create a representation of search data that has low dimensionality, is memory efficient, and effectively acts as a summary of search, and 2) we present CoSMo, a Constrained Search Model for estimating real world events using only search data. We demonstrate the efficacy of our contributions by estimating with high accuracy U.S. automobile sales and U.S. flu rates using only Google Search data. View details