Jump to Content

Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Publications

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 496 publications
    50 Shades of Support: A Device-Centric Analysis of Android Security Updates
    Abbas Acar
    Esteban Luques
    Harun Oz
    Ahmet Aris
    Selcuk Uluagac
    Network and Distributed System Security (NDSS) Symposium (2024)
    Preview abstract Android is by far the most popular OS with over three billion active mobile devices. As in any software, uncovering vulnerabilities on Android devices and applying timely patches are both critical. Android Open Source Project (AOSP) has initiated efforts to improve the traceability of security updates through Security Patch Levels (SPLs) assigned to devices. While this initiative provided better traceability for the vulnerabilities, it has not entirely resolved the issues related to the timeliness and availability of security updates for end users. Recent studies on Android security updates have focused on the issue of delay during the security update roll-out, largely attributing this to factors related to fragmentation. However, these studies fail to capture the entire Android ecosystem as they primarily examine flagship devices or do not paint a comprehensive picture of the Android devices’ lifecycle due to the datasets spanning over a short timeframe. To address this gap in the literature, we utilize a device-centric approach to analyze the security update behavior of Android devices. Our approach aims to understand the security update distribution behavior of OEMs (e.g., Samsung) by using a representative set of devices from each OEM and characterize the complete lifecycle of an average Android device. We obtained 367K official security update records from public sources, span- ning from 2014 to 2023. Our dataset contains 599 unique devices from four major OEMs that are used in 97 countries and are associated with 109 carriers. We identify significant differences in the roll-out of security updates across different OEMs, device models/types, and geographical regions across the world. Our findings show that the reasons for the delay in the roll-out of security updates are not limited to fragmentation but also involve OEM-specific factors. Our analysis also uncovers certain key issues that can be readily addressed as well as exemplary practices that can be immediately adopted by OEMs in practice. View details
    Wear's my Data? Understanding the Cross-Device Runtime Permission Model in Wearables
    Doguhan Yeke
    Muhammad Ibrahim
    Habiba Farukh
    Abdullah Imran
    Antonio Bianchi
    Z. Berkay Celik
    IEEE Security and Privacy (2024) (to appear)
    Preview abstract Wearable devices are becoming increasingly important, helping us stay healthy and connected. There are a variety of app-based wearable platforms that can be used to manage these devices. The apps on wearable devices often work with a companion app on users’ smartphones. The wearable device and the smartphone typically use two separate permission models that work synchronously to protect sensitive data. However, this design creates an opaque view of the management of permission- protected data, resulting in over-privileged data access without the user’s explicit consent. In this paper, we performed the first systematic analysis of the interaction between the Android and Wear OS permission models. Our analysis is two-fold. First, through taint analysis, we showed that cross-device flows of permission-protected data happen in the wild, demonstrating that 28 apps (out of the 150 we studied) on Google Play have sensitive data flows between the wearable app and its companion app. We found that these data flows occur without the users’ explicit consent, introducing the risk of violating user expectations. Second, we conducted an in-lab user study to assess users’ understanding of permissions when subject to cross-device communication (n = 63). We found that 66.7% of the users are unaware of the possibility of cross-device sensitive data flows, which impairs their understanding of permissions in the context of wearable devices and puts their sensitive data at risk. We also showed that users are vulnerable to a new class of attacks that we call cross-device permission phishing attacks on wearable devices. Lastly, we performed a preliminary study on other watch platforms (i.e., Apple’s watchOS, Fitbit, Garmin OS) and found that all these platforms suffer from similar privacy issues. As countermeasures for the potential privacy violations in cross-device apps, we suggest improvements in the system prompts and the permission model to enable users to make better-informed decisions, as well as on app markets to identify malicious cross-device data flows. View details
    FP-Fed: Privacy-Preserving Federated Detection of Browser Fingerprinting
    Meenatchi Sundaram Muthu Selva Annamalai
    Emiliano De Cristofaro
    Network and Distributed System Security (NDSS) Symposium (2024) (to appear)
    Preview abstract Browser fingerprinting often provides an attractive alternative to third-party cookies for tracking users across the web. In fact, the increasing restrictions on third-party cookies placed by common web browsers and recent regulations like the GDPR may accelerate the transition. To counter browser fingerprinting, previous work proposed several techniques to detect its prevalence and severity. However, these rely on 1) centralized web crawls and/or 2) computationally intensive operations to extract and process signals (e.g., information-flow and static analysis). To address these limitations, we present FP-Fed, the first distributed system for browser fingerprinting detection. Using FP-Fed, users can collaboratively train on-device models based on their real browsing patterns, without sharing their training data with a central entity, by relying on Differentially Private Federated Learning (DP-FL). To demonstrate its feasibility and effectiveness, we evaluate FP-Fed’s performance on a set of 18.3k popular websites with different privacy levels, numbers of participants, and features extracted from the scripts. Our experiments show that FP-Fed achieves reasonably high detection performance and can perform both training and inference efficiently, on-device, by only relying on runtime signals extracted from the execution trace, without requiring any resource-intensive operation. View details
    Broadly Enabling KLEE to Effortlessly Find Unrecoverable Errors
    Ying Zhang
    Peng Li
    Lingxiang Wang
    Na Meng
    Dan Williams
    (2024)
    Preview abstract Rust is a general-purpose programming language designed for performance and safety. Unrecoverable errors (e.g., Divide by Zero) in Rust programs are critical, as they signal bad program states and terminate programs abruptly. Previous work has contributed to utilizing KLEE, a dynamic symbolic test engine, to verify the program would not panic. However, it is difficult for engineers who lack domain expertise to write test code correctly. Besides, the effectiveness of KLEE in finding panics in production Rust code has not been evaluated. We created an approach, called PanicCheck, to hide the complexity of verifying Rust programs with KLEE. Using PanicCheck, engineers only need to annotate the function-to-verify with #[panic_check]. The annotation guides PanicCheck to generate test code, compile the function together with tests, and execute KLEE for verification. After applying PanicCheck to 21 open-source and 2 closed-source projects, we found 61 test inputs that triggered panics; 60 of the 61 panics have been addressed by developers so far. Our research shows promising verification results by KLEE, while revealing technical challenges in using KLEE. Our experience will shed light on future practice and research in program verification. View details
    AI-powered patching: the future of automated vulnerability fixes
    Jan Keller
    Google Security Engineering Technical Report (2024) (to appear)
    Preview abstract As AI continues to advance at rapid speed, so has its ability to unearth hidden security vulnerabilities in all types of software. Every bug uncovered is an opportunity to patch and strengthen code—but as detection continues to improve, we need to be prepared with new automated solutions that bolster our ability to fix those bugs. That’s why our Secure AI Framework (SAIF) includes a fundamental pillar addressing the need to “automate defenses to keep pace with new and existing threats.” This paper shares lessons from our experience leveraging AI to scale our ability to fix bugs, specifically those found by sanitizers in C/C++, Java, and Go code. By automating a pipeline to prompt Large Language Models (LLMs) to generate code fixes for human review, we have harnessed our Gemini model to successfully fix 15% of sanitizer bugs discovered during unit tests, resulting in hundreds of bugs patched. Given the large number of sanitizer bugs found each year, this seemingly modest success rate will with time save significant engineering effort. We expect this success rate to continually improve and anticipate that LLMs can be used to fix bugs in various languages across the software development lifecycle. View details
    Preview abstract This paper reflects on work at Google over the past decade to address common types of software safety and security defects. Our experience has shown that software safety is an emergent property of the software and tooling ecosystem it is developed in and the production environment into which it is deployed. Thus, to effectively prevent common weaknesses at scale, we need to shift-left the responsibility for ensuring safety and security invariants to the end-to-end developer ecosystem, that is, programming languages, software libraries, application frameworks, build and deployment tooling, the production platform and its configuration surfaces, and so forth. Doing so is practical and cost effective when developer ecosystems are designed with application archetypes in mind, such as web or mobile apps: The design of the developer ecosystem can address threat model aspects that apply commonly to all applications of the respective archetype, and investments to ensure safety invariants at the ecosystem level amortize across many applications. Applying secure-by-design principles to developer ecosystems at Google has achieved drastic reduction and in some cases near-zero residual rates of common classes of defects, across hundreds of applications being developed by thousands of developers. View details
    Secure by Design at Google
    Google Security Engineering (2024)
    Preview abstract This whitepaper provides an overview of Google's approach to secure design. View details
    Preview abstract 2022 marked the 50th anniversary of memory safety vulnerabilities, first reported by Anderson et al. Half a century later, we are still dealing with memory safety bugs despite substantial investments to improve memory unsafe languages. Like others', Google’s data and internal vulnerability research show that memory safety bugs are widespread and one of the leading causes of vulnerabilities in memory-unsafe codebases. Those vulnerabilities endanger end users, our industry, and the broader society. At Google, we have decades of experience addressing, at scale, large classes of vulnerabilities that were once similarly prevalent as memory safety issues. Based on this experience we expect that high assurance memory safety can only be achieved via a Secure-by-Design approach centered around comprehensive adoption of languages with rigorous memory safety guarantees. We see no realistic path for an evolution of C++ into a language with rigorous memory safety guarantees that include temporal safety. As a consequence, we are considering a gradual transition of C++ code at Google towards other languages that are memory safe. Given the large volume of pre-existing C++, we believe it is nonetheless necessary to improve the safety of C++ to the extent practicable. We are considering transitioning to a safer C++ subset, augmented with hardware security features like MTE. View details
    Website Data Transparency in the Browser
    Sebastian Zimmeck
    Daniel Goldelman
    Owen Kaplan
    Logan Brown
    Justin Casler
    Judeley Jean-Charles
    Joe Champeau
    24th Privacy Enhancing Technologies Symposium (PETS 2024), PETS (to appear)
    Preview abstract Data collection by websites and their integrated third parties is often not transparent. We design privacy interfaces for the browser to help people understand who is collecting which data from them. In a proof of concept browser extension, Privacy Pioneer, we implement a privacy popup, a privacy history interface, and a watchlist to notify people when their data is collected. For detecting location data collection, we develop a machine learning model based on TinyBERT, which reaches an average F1 score of 0.94. We supplement our model with deterministic methods to detect trackers, collection of personal data, and other monetization techniques. In a usability study with 100 participants 82% found Privacy Pioneer easy to understand and 90% found it useful indicating the value of privacy interfaces directly integrated in the browser. View details
    Securing the AI Software Supply Chain
    Isaac Hepworth
    Kara Olive
    Kingshuk Dasgupta
    Michael Le
    Mark Lodato
    Mihai Maruseac
    Sarah Meiklejohn
    Shamik Chaudhuri
    Tehila Minkus
    Google, Google, 1600 Amphitheatre Parkway, Mountain View, CA, 94043 (2024)
    Preview abstract As AI-powered features gain traction in software applications, we see many of the same problems we’ve faced with traditional software—but at an accelerated pace. The threat landscape continues to expand as AI is further integrated into everyday products, so we can expect more attacks. Given the expense of building models, there is a clear need for supply chain solutions. This paper explains our approach to securing our AI supply chain using provenance information and provides guidance for other organizations. Although there are differences between traditional and AI development processes and risks, we can build on our work over the past decade using Binary Authorization for Borg (BAB), Supply-chain Levels for Software Artifacts (SLSA), and next-generation cryptographic signing solutions via Sigstore, and adapt these to the AI supply chain without reinventing the wheel. Depending on internal processes and platforms, each organization’s approach to AI supply chain security will look different, but the focus should be on areas where it can be improved in a relatively short time. Readers should note that the first part of this paper provides a broad overview of “Development lifecycles for traditional and AI software”. Then we delve specifically into AI supply chain risks, and explain our approach to securing our AI supply chain using provenance information. More advanced practitioners may prefer to go directly to the sections on “AI supply chain risks,” “Controls for AI supply chain security,” or even the “Guidance for practitioners” section at the end of the paper, which can be adapted to the needs of any organization. View details
    Hybrid Post-Quantum Signatures in Hardware Security Keys
    Diana Ghinea
    Jennifer Pullman
    Julien Cretin
    Rafael Misoczki
    Stefan Kölbl
    Applied Cryptography and Network Security Workshop (2023)
    Preview abstract Recent advances in quantum computing are increasingly jeopardizing the security of cryptosystems currently in widespread use, such as RSA or elliptic-curve signatures. To address this threat, researchers and standardization institutes have accelerated the transition to quantum-resistant cryptosystems, collectively known as Post-Quantum Cryptography (PQC). These PQC schemes present new challenges due to their larger memory and computational footprints and their higher chance of latent vulnerabilities. In this work, we address these challenges by introducing a scheme to upgrade the digital signatures used by security keys to PQC. We introduce a hybrid digital signature scheme based on two building blocks: a classically-secure scheme, ECDSA, and a post-quantum secure one, Dilithium. Our hybrid scheme maintains the guarantees of each underlying building block even if the other one is broken, thus being resistant to classical and quantum attacks. We experimentally show that our hybrid signature scheme can successfully execute on current security keys, even though secure PQC schemes are known to require substantial resources. We publish an open-source implementation of our scheme at https://github.com/google/OpenSK/releases/tag/hybrid-pqc so that other researchers can reproduce our results on a nRF52840 development kit. View details
    Preview abstract We study the space complexity of the two related fields of differential privacy and adaptive data analysis. Specifically, (1) Under standard cryptographic assumptions, we show that there exists a problem P that requires exponentially more space to be solved efficiently with differential privacy, compared to the space needed without privacy. To the best of our knowledge, this is the first separation between the space complexity of private and non-private algorithms. (2) The line of work on adaptive data analysis focuses on understanding the number of samples needed for answering a sequence of adaptive queries. We revisit previous lower bounds at a foundational level, and show that they are a consequence of a space bottleneck rather than a sampling bottleneck. To obtain our results, we define and construct an encryption scheme with multiple keys that is built to withstand a limited amount of key leakage in a very particular way. View details
    Preview abstract Online hate and harassment poses a threat to the digital safety of people globally. In light of this risk, there is a need to equip as many people as possible with advice to stay safer online. We interviewed 24 experts to understand what threats and advice internet users should prioritize to prevent or mitigate harm. As part of this, we asked experts to evaluate 45 pieces of existing hate-and-harassment-specific digital-safety advice to understand why they felt advice was viable or not. We find that experts frequently had competing perspectives for which threats and advice they would prioritize. We synthesize sources of disagreement, while also highlighting the primary threats and advice where experts concurred. Our results inform immediate efforts to protect users from online hate and harassment, as well as more expansive socio-technical efforts to establish enduring safety. View details
    On the Potential of Mediation Chatbots for Mitigating Multiparty Privacy Conflicts - A Wizard-of-Oz Study
    Kavous Salehzadeh Niksirat
    Diana Korka
    Kévin Huguenin
    Mauro Cherubini
    The 26th ACM Conference On Computer-Supported Cooperative Work And Social Computing (CSCW) (2023) (to appear)
    Preview abstract Sharing multimedia content, without obtaining consent from the people involved causes multiparty privacy conflicts (MPCs). However, social-media platforms do not proactively protect users from the occurrence of MPCs. Hence, users resort to out-of-band, informal communication channels, attempting to mitigate such conflicts. So far, previous works have focused on hard interventions that do not adequately consider the contextual factors (e.g., social norms, cognitive priming) or are employed too late (i.e., the content has already been seen). In this work, we investigate the potential of conversational agents as a medium for negotiating and mitigating MPCs. We designed MediationBot, a mediator chatbot that encourages consent collection, enables users to explain their points of view, and proposes solutions to finding a middle ground. We evaluated our design using a Wizard-of-Oz experiment with N=32 participants, where we found that MediationBot can effectively help participants to reach an agreement and to prevent MPCs. It produced a structured conversation where participants had well-clarified speaking turns. Overall, our participants found MediationBot to be supportive as it proposes useful middle-ground solutions. Our work informs the future design of mediator agents to support social-media users against MPCs. View details
    Preview abstract The problem of learning threshold functions is a fundamental one in machine learning. Classical learning theory implies sample complexity of $O(\xi^{-1} \log(1/\beta))$ (for generalization error $\xi$ with confidence $1-\beta$). The private version of the problem, however, is more challenging and in particular, the sample complexity must depend on the size $|X|$ of the domain. Progress on quantifying this dependence, via lower and upper bounds, was made in a line of works over the past decade. In this paper, we finally close the gap for approximate-DP and provide a nearly tight upper bound of $\widetilde{O}(\log^* |X|)$, which matches a lower bound by Alon et al (that applies even with improper learning) and improves over a prior upper bound of $\widetilde{O}((\log^* |X|)^{1.5})$ by Kaplan et al. We also provide matching upper and lower bounds of $\tilde{\Theta}(2^{\log^*|X|})$ for the additive error of private quasi-concave optimization (a related and more general problem). Our improvement is achieved via the novel Reorder-Slice-Compute paradigm for private data analysis which we believe will have further applications. View details