Luca Invernizzi

Luca Invernizzi

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Give and Take: An End-To-End Investigation of Giveaway Scams
    Eric Mugnier
    Enze Liu
    Stefan Savage
    Geoffrey M. Voelker
    David Tao
    George Kappos
    Sarah Meiklejohn
    2024
    Preview abstract Scams — fraudulent narratives designed to extract money or items of value from victims — have existed as long as recorded history. However, the Internet’s combination of low communication cost, global reach, and functional anonymity has allowed scam volumes to reach their historic zenith. Designing effective interventions against such activities requires first understanding the context in which they thrive: how scammers advertise to potential victims, the proceeds they can expect in response, and how they ultimately monetize their illicit activities. In this paper, we focus on such questions in the specific context of a giveaway scam, in which scammers offer to give away cryptocurrency to users who send them coins first (often promising to send them back double whatever they sent). In particular, our work aims to understand how such giveaway scams are advertised on both on textual social media (Twitter) and via video livestreams (YouTube and Twitch), the extent to which such efforts are effective in attracting victims, and the scope and nature of the payments received in such fraudulent transactions. View details
    Preview abstract The task of content-type detection, which entails determining the data type encoded by byte streams, has a long history within the realm of computing and nowadays it is a key primitive for critical automated pipelines. The first program ever developed to perform this task is "file", which shipped with Bell Labs UNIX over five decades ago. Since then, a number of additional tools have been developed, but, despite their importance, to date it is not clear how well these approaches perform, and whether modern techniques can improve over the state of the art. This paper sheds light on this overlooked area. We collect a dataset of more than 26M samples, and we perform the first large-scale evaluation of existing content type tools. Then, we introduce Magika, a new content type detection tool based on deep learning. Magika is designed to be fast (5ms inference time), even on a single CPU, thus making it a viable replacement for existing command line tools and suitable for large-scale automated pipelines. Magika achieves 99\%+ average precision and recall, which is a double-digit % accuracy improvement (in absolute terms) over the state of the art. As a testament to its real-world utility, we are working with a large email provider and with Visual Studio Code developers on integrating Magika to be their reference content-type detector. To ease reproducibility, we release all our artifacts, including the tool, the model, the training pipeline, the dataset collection codebase, and details about our dataset. View details
    Generalized Power Attacks against Crypto Hardware using Long-Range Deep Learning
    Karel Král
    Marina Zhang
    Transactions on Cryptographic Hardware and Embedded Systems (TCHES), IACR (2024)
    Preview abstract To make cryptographic processors more resilient against side-channel attacks, engineers have developed various countermeasures. However, the effectiveness of these countermeasures is often uncertain, as it depends on the complex interplay between software and hardware. Assessing a countermeasure’s effectiveness using profiling techniques or machine learning so far requires significant expertise and effort to be adapted to new targets which makes those assessments expensive. We argue that including cost-effective automated attacks will help chip design teams to quickly evaluate their countermeasures during the development phase, paving the way to more secure chips.In this paper, we lay the foundations toward such automated system by proposing GPAM, the first deep-learning system for power side-channel analysis that generalizes across multiple cryptographic algorithms, implementations, and side-channel countermeasures without the need for manual tuning or trace preprocessing. We demonstrate GPAM’s capability by successfully attacking four hardened hardware-accelerated elliptic-curve digital-signature implementations. We showcase GPAM’s ability to generalize across multiple algorithms by attacking a protected AES implementation and achieving comparable performance to state-of-the-art attacks, but without manual trace curation and within a limited budget. We release our data and models as an open-source contribution to allow the community to independently replicate our results and build on them. View details
    Preview abstract ML models have shown significant promise in their ability to identify side channels leaking from a secure chip. However, the datasets used to train these models present unique challenges. Existing file formats often lack the ability to record metadata, which impedes the reusability and/or reproducibility of published datasets. Moreover, training pipelines for deep neural networks often require specific patterns to iterate through the data that are not provided by these file formats. In this presentation, we talk about the lessons learned in our research on side-channel attacks, and share insights gained from our mistakes in data structuring and iteration strategies. Additionally, we present Sedpack, our open-source dataset library which encapsulates these learnings to minimize oversights. Sedpack is optimized for speed, as we will demonstrate with some preliminary benchmarks. It can scale to larger-than-local-storage datasets, as these are becoming larger and larger with PQC. And it is not limited to ML pipelines either, as it can be easily used for classical attacks too. Join us to try Sedpack, with our hope to save you time in your side-channel research efforts. To get you started, we also publish several datasets in this format that we used in our publication Generalized Power Attacks against Crypto Hardware using Long-Range Deep Learning, CHES 2024. View details
    Hybrid Post-Quantum Signatures in Hardware Security Keys
    Diana Ghinea
    Jennifer Pullman
    Julien Cretin
    Rafael Misoczki
    Stefan Kölbl
    Applied Cryptography and Network Security Workshop (2023)
    Preview abstract Recent advances in quantum computing are increasingly jeopardizing the security of cryptosystems currently in widespread use, such as RSA or elliptic-curve signatures. To address this threat, researchers and standardization institutes have accelerated the transition to quantum-resistant cryptosystems, collectively known as Post-Quantum Cryptography (PQC). These PQC schemes present new challenges due to their larger memory and computational footprints and their higher chance of latent vulnerabilities. In this work, we address these challenges by introducing a scheme to upgrade the digital signatures used by security keys to PQC. We introduce a hybrid digital signature scheme based on two building blocks: a classically-secure scheme, ECDSA, and a post-quantum secure one, Dilithium. Our hybrid scheme maintains the guarantees of each underlying building block even if the other one is broken, thus being resistant to classical and quantum attacks. We experimentally show that our hybrid signature scheme can successfully execute on current security keys, even though secure PQC schemes are known to require substantial resources. We publish an open-source implementation of our scheme at https://github.com/google/OpenSK/releases/tag/hybrid-pqc so that other researchers can reproduce our results on a nRF52840 development kit. View details
    Spotlight: Malware Lead Generation at Scale
    Bernhard Grill
    Jennifer Pullman
    Cecilia M. Procopiuc
    David Tao
    Borbala Benko
    Proceedings of Annual Computer Security Applications Conference (ACSAC) (2020)
    Preview abstract Malware is one of the key threats to online security today, with applications ranging from phishing mailers to ransomware andtrojans. Due to the sheer size and variety of the malware threat, it is impractical to combat it as a whole. Instead, governments and companies have instituted teams dedicated to identifying, prioritizing, and removing specific malware families that directly affect their population or business model. The identification and prioritization of the most disconcerting malware families (known as malware hunting) is a time-consuming activity, accounting for more than 20% of the work hours of a typical threat intelligence researcher, according to our survey. To save this precious resource and amplify the team’s impact on users’ online safety we present Spotlight, a large-scale malware lead-generation framework. Spotlight first sifts through a large malware data set to remove known malware families, based on first and third-party threat intelligence. It then clusters the remaining malware into potentially-undiscovered families, and prioritizes them for further investigation using a score based on their potential business impact. We evaluate Spotlight on 67M malware samples, to show that it can produce top-priority clusters with over 99% purity (i.e., homogeneity), which is higher than simpler approaches and prior work. To showcase Spotlight’s effectiveness, we apply it to ad-fraud malware hunting on real-world data. Using Spotlight’s output, threat intelligence researchers were able to quickly identify three large botnets that perform ad fraud. View details
    Preview abstract Traffic monetization is a crucial component of running most for-profit online businesses. One of its latest incarnations is cryptocurrency mining, where a website instructs the visitor’s browser to participate in building a cryptocurrency ledger (e.g., Bitcoin, Monero) in exchange for a small reward in the same currency. In its essence, this practice trades the user’s electric bill (or battery level) for cryptocurrency. With user consent, this exchange can be a legitimate funding source – for example, UNICEF has collected over 27k charity donations on a website dedicated to this purpose, thehopepage.org. Regrettably, this practice also easily lends itself to abuse: in this form, called cryptojacking, attacks surreptitiously mine in the users browser, and profits are collected either by website owners or by hackers that planted the mining script into a vulnerable page. Understandably, users frown upon this practice and have sought to mitigate it by installing blacklist-based browser extensions (the top 3 for Chrome total over one million installs), whereas researchers have devised more robust methods to detect it [1]–[6]. In turn, cryptojackers have been bettering their evasion techniques, incorporating in their toolkits domain fluxing, content obfuscation, the use of WebAssembly, and throttling. The latter, for example, grew from being a niche feature, adopted by only one in ten sites in 2018 [2], to become commonplace in 2019, reaching an adoption ratio of 58%. Whereas most state-of-the-art defenses address multiple of these evasion techniques, none is resistant against all. In this paper, we offer a novel detection method, CoinPolice, that is robust against all of the aforementioned evasion techniques. CoinPolice flips throttling against cryptojackers, artificially varying the browser’s CPU power to observe the presence of throttling. Based on a deep neural network classifier, CoinPolice can detect 97.87% of hidden miners with a low false positive rate (0.74%). We compare CoinPolice performance with the current state of the art and show our approach outperforms it when detecting aggressively throttled miners. Finally, we deploy Coinpolice to perform the largest-scale cryptoming investigation to date, identifying 6700 sites that monetize traffic in this fashion. View details
    Protecting accounts from credential stuffing with password breach alerting
    Jennifer Pullman
    Kevin Yeo
    Ananth Raghunathan
    Patrick Gage Kelley
    Borbala Benko
    Sarvar Patel
    Dan Boneh
    Proceedings of the USENIX Security Symposium, Usenix (2019)
    Preview abstract Protecting accounts from credential stuffing attacks remains burdensome due to an asymmetry of knowledge: attackers have wide-scale access to billions of stolen usernames and passwords, while users and identity providers remain in the dark as to which accounts require remediation. In this paper, we propose a privacy-preserving protocol whereby a client can query a centralized breach repository to determine whether a specific username and password combination is publicly exposed, but without revealing the information queried. Here, a client can be an end user, a password manager, or an identity provider. To demonstrate the feasibility of our protocol, we implement a cloud service that mediates access to over 4 billion credentials found in breaches and a Chrome extension serving as an initial client. Based on anonymous telemetry from nearly 670,000 users and 21 million logins, we find that 1.5% of logins on the web involve breached credentials. By alerting users to this breach status, 26% of our warnings result in users migrating to a new password, at least as strong as the original. Our study illustrates how secure, democratized access to password breach alerting can help mitigate one dimension of account hijacking. View details
    Five Years of the Right to be Forgotten
    Theo Bertram
    Stephanie Caro
    Hubert Chao
    Rutledge Chin Feman
    Peter Fleischer
    Albin Gustafsson
    Jess Hemerly
    Chris Hibbert
    Lanah Kammourieh Donnelly
    Jason Ketover
    Jay Laefer
    Paul Nicholas
    Yuan Niu
    Harjinder Obhi
    David Price
    Andrew Strait
    Al Verney
    Proceedings of the Conference on Computer and Communications Security (2019)
    Preview abstract The “Right to be Forgotten” is a privacy ruling that enables Europeans to delist certain URLs appearing in search results related to their name. In order to illuminate the effect this ruling has on information access, we conducted a retrospective measurement study of 3.2 million URLs that were requested for delisting from Google Search over five years. Our analysis reveals the countries and anonymized parties generating the largest volume of requests (just 1,000 requesters generated 16% of requests); the news, government, social media, and directory sites most frequently targeted for delisting (17% of removals relate to a requester’s legal history including crimes and wrongdoing); and the prevalence of extraterritorial requests. Our results dramatically increase transparency around the Right to be Forgotten and reveal the complexity of weighing personal privacy against public interest when resolving multi-party privacy conflicts that occur across the Internet. The results of our investigation have since been added to Google’s transparency report. View details
    Tracking Ransomware End-to-end
    Danny Y. Huang
    Maxwell Matthaios Aliapoulios
    Vector Guo Li
    Kylie McRoberts
    Jonathan Levin
    Kirill Levchenko
    Alex C. Snoeren
    Damon McCoy
    Security & Privacy 2018 (2018)
    Preview abstract Ransomware is a type of malware that encrypts the files of infected hosts and demands payment, often in a cryptocurrency such as bitcoin. In this paper, we create a measurement framework that we use to perform a large-scale, two-year, end-to-end measurement of ransomware payments, victims, and operators. By combining an array of data sources, including ransomware binaries, seed ransom payments, victim telemetry from infections, and a large database of bitcoin addresses annotated with their owners, we sketch the outlines of this burgeoning ecosystem and associated third-party infrastructure. In particular, we trace the financial transactions, from the moment victims acquire bitcoins, to when ransomware operators cash them out. We find that many ransomware operators cashed out using BTC-e, a now-defunct bitcoin exchange. In total we are able to track over $16 million in likely ransom payments made by 19,750 potential victims during a two-year period. While our study focuses on ransomware, our methods are potentially applicable to other cybercriminal operations that have similarly adopted bitcoin as their payment channel. View details