Igor Bilogrevic

Igor Bilogrevic

I am a Staff Research Scientist and research lead. I work on applied machine learning in order to build novel privacy and security features in our products. I have a PhD on applied cryptography and machine learning for privacy-enhancing technologies from EPFL.

Previously, I worked in collaboration with the Nokia Research Center on privacy challenges in pervasive mobile networks, encompassing data, location and information-sharing privacy. I've spent a summer at PARC (a Xerox Company), conducting research on topics related to private data analytics. I am a co-inventor on several patents filed by Nokia, PARC and Google.

I am interested in several domains that are related to the applications of machine learning and AI to privacy and security, such as web browser privacy and contextual intelligence.

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract A recent large-scale experiment conducted by Chrome has demonstrated that a "quieter" web permission prompt can reduce unwanted interruptions while only marginally affecting grant rates. However, the experiment and the partial roll-out were missing two important elements: (1) an effective and context-aware activation mechanism for such a quieter prompt, and (2) an analysis of user attitudes and sentiment towards such an intervention. In this paper, we address these two limitations by means of a novel ML-based activation mechanism -- and its real-world on-device deployment in Chrome -- and a large-scale user study with 13.1k participants from 156 countries. First, the telemetry-based results, computed on more than 20 million samples from Chrome users in-the-wild, indicate that the novel on-device ML-based approach is both extremely precise (>99% post-hoc precision) and has very high coverage (96% recall for notifications permission). Second, our large-scale, in-context user study shows that quieting is often perceived as helpful and does not cause high levels of unease for most respondents. View details
    Shorts vs. Regular Videos on YouTube: A Comparative Analysis of User Engagement and Content Creation Trends
    Caroline Violot
    Tugrulcan Elmais
    Mathias Humbert
    ACM Web Science Conference 2024 (WEBSCI24)(2024) (to appear)
    Preview abstract YouTube introduced the Shorts video format in 2021, allowing users to upload short videos that are prominently displayed on its website and app. Despite having such a large visual footprint, there are no studies to date that have looked at the impact Shorts introduction had on the production and consumption of content on YouTube. This paper presents the first comparative analysis of YouTube Shorts versus regular videos with respect to user engagement (i.e., views, likes, and comments), content creation frequency and video categories. We collected a dataset containing information about 70k channels that posted at least one Short, and we analyzed the metadata of all the videos (9.9M Shorts and 6.9M regular videos) they uploaded between January 2021 and December 2022, spanning a two-year period including the introduction of Shorts. Our longitudinal analysis shows that content creators consistently increased the frequency of Shorts production over this period, especially for newly-created channels, which surpassed that of regular videos. We also observe that Shorts target mostly entertainment categories, while regular videos cover a wide variety of categories. In general, Shorts attract more views and likes per view than regular videos, but attract less comments per view. However, Shorts do not outperform regular videos in the education and political categories as much as they do in other categories. Our study contributes to understanding social media dynamics, to quantifying the spread of short-form content, and to motivating future research on its impact on society. View details
    Preview abstract Browser fingerprinting is often associated with cross-site user tracking, a practice that many browsers (e.g., Safari, Brave, Edge, Firefox and Chrome) want to block. However, less is publicly known about its uses to enhance online safety, where it can provide an additional security layer against service abuses (e.g., in combination with CAPTCHAs) or during user authentication. To the best of our knowledge, no fingerprinting defenses deployed thus far consider this important distinction when blocking fingerprinting attempts, so they might negatively affect website functionality and security. To address this issue we make three main contributions. First, we propose and evaluate a novel machine learning-based method to automatically identify authentication pages (i.e. sign-in and sign-up pages). Our algorithm -- which relies on a hybrid unsupervised/supervised approach -- achieves 96-98% precision and recall on a large, manually-labelled dataset of 10,000 popular sites. Second, we compare our algorithm with other methods from prior works on the same dataset, showing that it significantly outperforms all of them (+83% F1-score). Third, we quantify the prevalence of fingerprinting scripts across sign-in and sign-up pages (9.2%) versus those executed on other pages (8.9%); while the rates of fingerprinting are similar, home pages and authentication pages differ in the third-party scripts they include and how often these scripts are labeled as tracking. We also highlight the substantial differences in fingerprinting behavior on login and sign-up pages. Our work sheds light on the complicated reality that fingerprinting is used to both protect user security and invade user privacy, and that this dual nature must be considered by fingerprinting mitigations. View details
    Assessing Web Fingerprinting Risk
    Robert Busa-Fekete
    Antonio Sartori
    Umar Syed
    Proceedings of the ACM Web Conference (WWW 2024)
    Preview abstract Modern Web APIs allow developers to provide extensively customized experiences for website visitors, but the richness of the device information they provide also make them vulnerable to being abused by malign actors to construct browser fingerprints, device-specific identifiers that enable covert tracking of users even when cookies are disabled. Previous research has established entropy, a measure of information, as the key metric for quantifying fingerprinting risk. Earlier studies that estimated the entropy of Web APIs were based on data from a single website or were limited to an extremely small sample of clients. They also analyzed each Web API separately and then summed their entropies to quantify overall fingerprinting risk, an approach that can lead to gross overestimates. We provide the first study of browser fingerprinting which addresses the limitations of prior work. Our study is based on actual visited pages and Web API function calls reported by tens of millions of real Chrome browsers in-the-wild. We accounted for the dependencies and correlations among Web APIs, which is crucial for obtaining more realistic entropy estimates. We also developed a novel experimental design that accurately estimates entropy while never observing too much information from any single user. Our results provide an understanding of the distribution of entropy for different website categories, confirm the utility of entropy as a fingerprinting proxy, and offer a method for evaluating browser enhancements which are intended to mitigate fingerprinting. View details
    FP-Fed: Privacy-Preserving Federated Detection of Browser Fingerprinting
    Meenatchi Sundaram Muthu Selva Annamalai
    Emiliano De Cristofaro
    Network and Distributed System Security (NDSS) Symposium(2024) (to appear)
    Preview abstract Browser fingerprinting often provides an attractive alternative to third-party cookies for tracking users across the web. In fact, the increasing restrictions on third-party cookies placed by common web browsers and recent regulations like the GDPR may accelerate the transition. To counter browser fingerprinting, previous work proposed several techniques to detect its prevalence and severity. However, these rely on 1) centralized web crawls and/or 2) computationally intensive operations to extract and process signals (e.g., information-flow and static analysis). To address these limitations, we present FP-Fed, the first distributed system for browser fingerprinting detection. Using FP-Fed, users can collaboratively train on-device models based on their real browsing patterns, without sharing their training data with a central entity, by relying on Differentially Private Federated Learning (DP-FL). To demonstrate its feasibility and effectiveness, we evaluate FP-Fed’s performance on a set of 18.3k popular websites with different privacy levels, numbers of participants, and features extracted from the scripts. Our experiments show that FP-Fed achieves reasonably high detection performance and can perform both training and inference efficiently, on-device, by only relying on runtime signals extracted from the execution trace, without requiring any resource-intensive operation. View details
    "Shhh...be Quiet!" Reducing the Unwanted Interruptions of Notification Permission Prompts on Chrome
    Balazs Engedy
    Jud Porter
    Kamila Hasanbega
    Andrew Paseltiner
    Hwi Lee
    Edward Jung
    PJ McLachlan
    Jason James
    30th USENIX Security Symposium (USENIX Security 21), USENIX Association, Vancouver, B.C.(2021)
    Preview abstract Push notifications are an extremely useful feature. In web browsers, they allow users to receive timely updates even if the website is not currently open. On Chrome, the feature has become extremely popular since its inception in 2015, but it is also the least likely to be accepted by users. Our telemetry shows that, although 74% of all permission prompts are about notifications, they are also the least likely to be granted with only a 10% grant rate on desktop and 21% grant rate on Android. In order to preserve its utility for the websites and to reduce unwanted interruptions for the users, we designed and tested a new UI for notification permission prompt on Chrome. In this paper, we conduct two large-scale studies of Chrome users interactions with the notifications permission prompt in the wild, in order to understand how users interact with such prompts and to evaluate a novel design that we introduced in Chrome version 80 in February 2020. Our main goal for the redesigned UI is to reduce the unwanted interruptions due to notification permission prompts for Chrome users, the frequency at which users have to suppress them and the ease of changing a previously made choice. Our results, based on an A/B test using behavioral data from more than 40 million users who interacted with more than 100 million prompts on more than 70 thousand websites, show that the new UI is very effective at reducing the unwanted interruptions and their frequency (up to 30% fewer unnecessary actions on the prompts), with a minimal impact (less than 5%) on the grant rates, across all types of users and websites. We achieve these results thanks to a novel adaptive activation mechanism coupled with a block list of interrupting websites, which is derived from crowd-sourced telemetry from Chrome clients. View details
    Nothing Standard About It: An Analysis of Minimum Security Standards in Organizations
    Jake Weidman
    Jens Grossklags
    ESORICS 2020, Computer Security, Springer International Publishing, pp. 263-282
    Preview abstract Written security policies are an important part of the complex set of measures to protect organizations from adverse events. However, research detailing these policies and their effectiveness is comparatively sparse. We tackle this research gap by conducting an analysis of a specific user-oriented sub-component of a full information security policy, the Minimum Security Standard. Specifically, we conduct an analysis of 29 publicly accessible minimum security standard documents from U.S. academic institutions. We study the prevalence of an extensive set of user-oriented provisions across these statements such as who is being addressed, whether the standard is considered binding and how it is being enforced, and which specific procedures and practices for users are introduced. We demonstrate significant diversity in focus, style and comprehensiveness in this sample of minimum security standards and discuss their significance within the overall security landscape of organizations. View details
    Reducing Permission Requests in Mobile Apps
    Martin Pelikan
    Ulfar Erlingsson
    Giles Hogben
    Proceedings of ACM Internet Measurement Conference (IMC)(2019)
    Preview abstract Users of mobile apps sometimes express discomfort or concerns with what they see as unnecessary or intrusive permission requests by certain apps. However encouraging mobile app developers to request fewer permissions is challenging because there are many reasons why permissions are requested; furthermore, prior work has shown it is hard to disambiguate the purpose of a particular permission with high certainty. In this work we describe a novel, algorithmic mechanism intended to discourage mobile-app developers from asking for unnecessary permissions. Developers are incentivized by an automated alert, or "nudge", shown in the Google Play Console when their apps ask for permissions that are requested by very few functionally-similar apps---in other words, by their competition. Empirically, this incentive is effective, with significant developer response since its deployment. Permissions have been redacted by 59% of apps that were warned, and this attenuation has occurred broadly across both app categories and app popularity levels. Importantly, billions of users' app installs from the Google Play have benefited from these redactions View details
    Privacy in Geospatial Applications and Location-Based Social Networks
    Handbook of Mobile Data Privacy, Springer(2018), pp. 195-228
    Preview abstract The use of location data has greatly benefited from the availability of location-based services, the popularity of social networks, and the accessibility of public location data sets. However, in addition to providing users with the ability to obtain accurate driving directions or the convenience of geo-tagging friends and pictures, location is also a very sensitive type of data, as attested by more than a decade of research on different aspects of privacy related to location data. In this chapter, we focus on two domains that rely on location data as their core component: Geospatial applications (such as thematic maps and crowdsourced geo-information) and location-based social networks. We discuss the increasing relevance of geospatial applications to the current location-aware services, and we describe relevant concepts such as volunteered geographic information, geo-surveillance and how they relate to privacy. Then, we focus on a subcategory of geospatial applications, location-based social networks, and we introduce the different entities (such as users, services and providers) that are involved in such networks, and we characterize their role and interactions. We present the main privacy challenges and we discuss the approaches that have been proposed to mitigate privacy risks in location-based social networks. Finally, we conclude with a discussion of open research questions and promising directions that will contribute to improve privacy for users of location-based social networks. View details
    Side-Channel Inference Attacks on Mobile Keypads using Smartwatches
    Anindya Maiti
    Murtuza Jadliwala
    Jibo He
    IEEE Transactions on Mobile Computing, 17(2018), pp. 760-774
    Preview abstract Smartwatches enable many novel applications and are fast gaining popularity. However, the presence of a diverse set of on-board sensors provides an additional attack surface to malicious software and services on these devices. In this paper, we investigate the feasibility of key press inference attacks on handheld numeric touchpads by using smartwatch motion sensors as a side-channel. We consider different typing scenarios, and propose multiple attack approaches to exploit the characteristics of the observed wrist movements for inferring individual key presses. Experimental evaluation using a commercial off-the-shelf smartwatch and smartphone show that key press inference using smartwatch motion sensors is not only fairly accurate, but also better than similar attacks previously demonstrated using smartphone motion sensors. Additionally, hand movements captured by a combination of both smartwatch and smartphone motion sensors yields better inference accuracy than either device considered individually. View details
    Towards Usable Checksums: Automating Web Downloads Verification for the Masses
    Alexandre Meylan
    Bertil Chapuis
    Kevin Huguenin
    Mathias Humbert
    Mauro Cherubini
    ACM CCS(2018)
    Preview abstract Internet users can download software for their computers from app stores (e.g., Mac App Store and Windows Store) or from other sources, such as the developers' websites. Most Internet users in the US rely on the latter, according to our representative study, which makes them directly responsible for the content they download. To enable users to detect if the downloaded files have been corrupted, developers can publish a checksum together with the link to the program file; users can then manually verify that the checksum matches the one they obtain from the downloaded file. In this paper, we assess the prevalence of such behavior among the general Internet population in the US (N=2,000), and we develop easy-to-use tools for users and developers to automate both the process of checksum verification and generation. Specifically, we propose an extension to the recent W3C specification for sub-resource integrity in order to provide integrity protection for download links. Also, we develop an extension for the popular Chrome browser that computes and verifies checksums of downloaded files automatically, and an extension for the WordPress CMS that developers can use to easily attach checksums to their remote content. Our in situ experiments with 40 participants demonstrate the usability and effectiveness issues of checksums verification, and shows user desirability for our extension. View details
    Preview abstract A great deal of research on the management of user data on smartphones via permission systems has revealed significant levels of user discomfort, lack of understanding, and lack of attention. The majority of these studies were conducted on Android devices before runtime permission dialogs were widely deployed. In this paper we explore how users make decisions with runtime dialogs on smartphones with Android 6.0 or higher. We employ an experience sampling methodology in order to ask users the reasons influencing their decisions immediately after they decide. We conducted a longitudinal survey with 157 participants over a 6 week period. We explore the grant and denial rates of permissions, overall and on a per permission type basis. Overall, our participants accepted 84% of the permission requests. We observe differences in the denial rates across permissions types; these vary from 23% (for microphone) to 10% (calendar). We find that one of the main reasons for granting or denying a permission request depends on users’ expectation on whether or not an app should need a permission. A common reason for denying permissions is because users know they can change them later. Among the permissions granted, our participants said they were comfortable with 90% of those decisions - indicating that for 10% of grant decisions users may be consenting reluctantly. Interestingly, we found that women deny permissions twice as often as men. View details
    A Predictive Model for User Motivation and Utility Implications of Privacy Protection Mechanisms in Location Check-Ins
    Kevin Huguenin
    Joana Soares Machado
    Stefan Mihaila
    Reza Shokri
    Italo Dacosta
    Jean-Pierre Hubaux
    IEEE Transactions on Mobile Computing(2017)
    Preview abstract Location check-ins contain both geographical and semantic information about the visited venues. Semantic information is usually represented by means of tags (e.g., “restaurant”). Such data can reveal some personal information about users beyond what they actually expect to disclose, hence their privacy is threatened. To mitigate such threats, several privacy protection techniques based on location generalization have been proposed. Although the privacy implications of such techniques have been extensively studied, the utility implications are mostly unknown. In this paper, we propose a predictive model for quantifying the effect of a privacy-preserving technique (i.e., generalization) on the perceived utility of check-ins. We first study the users’ motivations behind their location check ins, based on a study targeted at Foursquare users (N = 77). We propose a machine-learning method for determining the motivation behind each check-in, and we design a motivation-based predictive model for the utility implications of generalization. Based on the survey data, our results show that the model accurately predicts the fine-grained motivation behind a check-in in 43% of the cases and in 63% of the cases for the coarse-grained motivation. It also predicts, with a mean error of 0.52 (on a scale from 1 to 5), the loss of utility caused by semantic and geographical generalization. This model makes it possible to design of utility-aware, privacy-enhancing mechanisms in location-based online social networks. It also enables service providers to implement location-sharing mechanisms that preserve both the utility and privacy for their users. View details
    Preview abstract Online services often rely on processing users’ data, which can be either provided directly by the users or combined from other services. Although users are aware of the latter, it is unclear whether they are comfortable with such data combination, whether they view it as beneficial for them, or the extent to which they believe that their privacy is exposed. Through an online survey (N=918) and follow-up interviews (N=14), we show that (1) comfort is highly dependent on the type of data, type of service and on the existence of a direct relationship with a company, (2) users have a highly different opinion about the presence of benefits for them, irrespectively of the context, and (3) users perceive the combination of online data as more identifying than data related to offline and physical behavior (such as location). Finally, we discuss several strategies for companies to improve upon these issues View details
    (Smart) watch your taps: side-channel keystroke inference attacks using smartwatches
    Anindya Maiti
    Murtuza Jadliwala
    Jibo He
    ACM International Symposium on Wearable Computers(2015), pp. 27-30
    Preview abstract In this paper, we investigate the feasibility of keystroke inference attacks on handheld numeric touchpads by using smartwatch motion sensors as a side-channel. The proposed attack approach employs supervised learning techniques to accurately map the uniqueness in the captured wrist movements to each individual keystroke. Experimental evaluation shows that keystroke inference using smartwatch motion sensors is not only fairly accurate, but also better than similar attacks previously demonstrated using smartphone motion sensors. View details
    A Machine-Learning Based Approach to Privacy-Aware Information-Sharing in Mobile Social Networks
    Kevin Huguenin
    Berker Agir
    Murtuza Jadliwala
    Maria Gazaki
    Jean-Pierre Hubaux
    Pervasive and Mobile Computing (PMC)(2016)
    Preview abstract Contextual information about users is increasingly shared on mobile social networks. Examples of such information include users’ locations, events, activities, and the co-presence of others in proximity. When disclosing personal information, users take into account several factors to balance privacy, utility and convenience — they want to share the “right” amount and type of information at each time, thus revealing a selective sharing behavior depending on the context, with a minimum amount of user interaction. In this article, we present SPISM, a novel information-sharing system that decides (semi-)automatically, based on personal and contextual features, whether to share information with others and at what granularity, whenever it is requested. SPISM makes use of (active) machine-learning techniques, including cost-sensitive multi-class classifiers based on support vector machines. SPISM provides both ease of use and privacy features: It adapts to each user’s behavior and predicts the level of detail for each sharing decision. Based on a personalized survey about information sharing, which involves 70 participants, our results provide insight into the most influential features behind a sharing decision, the reasons users share different types of information and their confidence in such decisions. We show that SPISM outperforms other kinds of policies; it achieves a median proportion of correct sharing decisions of 72% (after only 40 manual decisions). We also show that SPISM can be optimized to gracefully balance utility and privacy, but at the cost of a slight decrease in accuracy. Finally, we assess the potential of a one-size-fits-all version of SPISM. View details
    SecureRun: Cheat-Proof and Private Summaries for Location-Based Activities
    Anh Pham
    Kevin Huguenin
    Jean-Pierre Hubaux
    IEEE Transactions on Mobile Computing, PP(2015), pp. 1 - 14
    Preview abstract Activity-tracking applications, where people record and upload information about their location-based activities (e.g., the routes of their activities), are increasingly popular. Such applications enable users to share information and compete with their friends on activity-based social networks but also, in some cases, to obtain discounts on their health insurance premiums by proving they conduct regular fitness activities. However, they raise privacy and security issues: the service providers know the exact locations of their users; the users can report fake location information, for example, to unduly brag about their performance. In this paper, we present SecureRun, a secure privacy-preserving system for reporting location-based activity summaries (e.g., the total distance covered and the elevation gain). SecureRun is based on a combination of cryptographic techniques and geometric algorithms, and it relies on existing Wi-Fi access-point networks deployed in urban areas. We evaluate SecureRun by using real data-sets from the FON hotspot community networks and from the Garmin Connect activity-based social network, and we show that it can achieve tight (up to a median accuracy of more than 80%) verifiable lower-bounds of the distance covered and of the elevation gain, while protecting the location privacy of the users with respect to both the social network operator and the access point network operator(s). The results of our online survey, targeted at RunKeeper users recruited through the Amazon Mechanical Turk platform, highlight the lack of awareness and significant concerns of the participants about the privacy and security issues of activity-tracking applications. They also show a good level of satisfaction regarding SecureRun and its performance. View details
    Predicting Users' Motivations behind Location Check-Ins and Utility Implications of Privacy Protection Mechanisms
    Kevin Huguenin
    Stefan Mihaila
    Reza Shokri
    Jean-Pierre Hubaux
    NDSS(2015)
    Preview abstract Location check-ins contain both geographical and semantic information about the visited venues, in the form of tags (e.g., “restaurant”). Such data might reveal some personal information about users beyond what they actually want to disclose, hence their privacy is threatened. In this paper, we study users’ motivations behind location check-ins, and we quantify the effect of a privacy-preserving technique (i.e., generalization) on the perceived utility of check-ins. By means of a targeted user study on Foursquare (N = 77), we show that the motivation behind Foursquare check-ins is a mediator of the loss of utility caused by generalization. Using these findings, we propose a machine learning method for determining the motivation behind each check-in, and we design a motivation-based predictive model for utility. Our results show that the model accurately predicts the loss of utility caused by semantic and geographical generalization; this model enables the design of utility-aware, privacy-enhancing mechanisms in location-based social networks. View details
    Secure and private proofs for location-based activity summaries in urban areas
    Anh Pham
    Kevin Huguenin
    Jean-Pierre Hubaux
    Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing(2014)
    Preview abstract Activity-based social networks, where people upload and share information about their location-based activities (e.g., the routes of their activities), are increasingly popular. Such systems, however, raise privacy and security issues: The service providers know the exact locations of their users; the users can report fake location information in order to, for example, unduly brag about their performance. In this paper, we propose a secure privacy-preserving system for reporting location-based activity summaries (e.g., the total distance covered and the elevation gain). Our solution is based on a combination of cryptographic techniques and geometric algorithms, and it relies on existing Wi-Fi access-point networks deployed in urban areas. We evaluate our solution by using real data sets from the FON community networks and from the Garmin Connect activity-based social network, and we show that it can achieve tight (up to a median accuracy of 76%) verifiable lower-bounds of the distance covered and of the elevation gain, while protecting the location privacy of the users with respect to both the social network operator and the access-point network operator(s). View details