Jump to Content
Hamza Harkous

Hamza Harkous

Hamza Harkous is a Staff Research Scientist at Google, Zürich. He currently leads an effort to transform the data curation and model building process with large language models, driving advancements in privacy, safety, security, and beyond across Google's products. He previously architected the machine learning models behind Google’s Checks, the privacy compliance service. Prior to his tenure at Google, he worked at Amazon Alexa on natural language understanding and generation. He received his PhD in Computer Science from the Swiss Federal Institute of Technology in Lausanne (EPFL), where he also served as a postdoctoral researcher. During that time, he researched and developed tools for improving users’ comprehension of privacy practices and for automatically analyzing privacy policies. You can find more about his work on his personal homepage.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Website Data Transparency in the Browser
    Sebastian Zimmeck
    Daniel Goldelman
    Owen Kaplan
    Logan Brown
    Justin Casler
    Judeley Jean-Charles
    Joe Champeau
    24th Privacy Enhancing Technologies Symposium (PETS 2024), PETS (to appear)
    Preview abstract Data collection by websites and their integrated third parties is often not transparent. We design privacy interfaces for the browser to help people understand who is collecting which data from them. In a proof of concept browser extension, Privacy Pioneer, we implement a privacy popup, a privacy history interface, and a watchlist to notify people when their data is collected. For detecting location data collection, we develop a machine learning model based on TinyBERT, which reaches an average F1 score of 0.94. We supplement our model with deterministic methods to detect trackers, collection of personal data, and other monetization techniques. In a usability study with 100 participants 82% found Privacy Pioneer easy to understand and 90% found it useful indicating the value of privacy interfaces directly integrated in the browser. View details
    CookieEnforcer: Automated Cookie Notice Analysis and Enforcement
    Rishabh Khandelwal
    Asmit Nayak
    Kassem Fawaz
    32th USENIX Security Symposium (2023)
    Preview abstract Online websites use cookie notices to elicit consent from the users, as required by recent privacy regulations like the GDPR and the CCPA. Prior work has shown that these notices are designed in a way to manipulate users into making websitefriendly choices which put users’ privacy at risk. In this work, we present CookieEnforcer, a new system for automatically discovering cookie notices and extracting a set of instructions that result in disabling all non-essential cookies. In order to achieve this, we first build an automatic cookie notice detector that utilizes the rendering pattern of the HTML elements to identify the cookie notices. Next, we analyze the cookie notices and predict the set of actions required to disable all unnecessary cookies. This is done by modeling the problem as a sequence-to-sequence task, where the input is a machine-readable cookie notice and the output is the set of clicks to make. We demonstrate the efficacy of CookieEnforcer via an end-to-end accuracy evaluation, showing that it can generate the required steps in 93.7% of the cases. Via a user study, we also show that CookieEnforcer can significantly reduce the user effort. Finally, we characterize the behavior of CookieEnforcer on the top 100k websites from the Tranco list, showcasing its stability and scalability. View details
    On the Potential of Mediation Chatbots for Mitigating Multiparty Privacy Conflicts - A Wizard-of-Oz Study
    Kavous Salehzadeh Niksirat
    Diana Korka
    Kévin Huguenin
    Mauro Cherubini
    The 26th ACM Conference On Computer-Supported Cooperative Work And Social Computing (CSCW) (2023) (to appear)
    Preview abstract Sharing multimedia content, without obtaining consent from the people involved causes multiparty privacy conflicts (MPCs). However, social-media platforms do not proactively protect users from the occurrence of MPCs. Hence, users resort to out-of-band, informal communication channels, attempting to mitigate such conflicts. So far, previous works have focused on hard interventions that do not adequately consider the contextual factors (e.g., social norms, cognitive priming) or are employed too late (i.e., the content has already been seen). In this work, we investigate the potential of conversational agents as a medium for negotiating and mitigating MPCs. We designed MediationBot, a mediator chatbot that encourages consent collection, enables users to explain their points of view, and proposes solutions to finding a middle ground. We evaluated our design using a Wizard-of-Oz experiment with N=32 participants, where we found that MediationBot can effectively help participants to reach an agreement and to prevent MPCs. It produced a structured conversation where participants had well-clarified speaking turns. Overall, our participants found MediationBot to be supportive as it proposes useful middle-ground solutions. Our work informs the future design of mediator agents to support social-media users against MPCs. View details
    Preview abstract Integrating user feedback is one of the pillars for building successful products. However, this feedback is generally collected in an unstructured free-text form, which is challenging to understand at scale. This is particularly demanding in the privacy domain due to the nuances associated with the concept and the limited existing solutions. In this work, we present Hark, a system for discovering and summarizing privacy-related feedback at scale. Hark automates the entire process of summarizing privacy feedback, starting from unstructured text and resulting in a hierarchy of high-level privacy themes and fine-grained issues within each theme, along with representative reviews for each issue. At the core of Hark is a set of new deep learning models trained on different tasks, such as privacy feedback classification, privacy issues generation, and high-level theme creation. We illustrate Hark’s efficacy on a corpus of 626M Google Play reviews. Out of this corpus, our privacy feedback classifier extracts 6M privacy-related reviews (with an AUC-ROC of 0.92). With three annotation studies, we show that Hark’s generated issues are of high accuracy and coverage and that the theme titles are of high quality. We illustrate Hark’s capabilities by presenting high-level insights from 1.3M Android apps. View details
    PriSEC: A Privacy Settings Enforcement Controller
    Rishabh Khandelwal
    Thomas Linden
    Kassem Fawaz
    30th USENIX Security Symposium (2021)
    Preview abstract Online privacy settings aim to provide users with control over their data. However, in their current state, they suffer from usability and reachability issues. The recent push towards automatically analyzing privacy notices has not accompanied a similar effort for the more critical case of privacy settings. So far, the best efforts targeted the special case of making opt-out pages more reachable. In this work, we present PriSEC, a Privacy Settings Enforcement Controller that leverages machine learning techniques towards a new paradigm for automatically enforcing web privacy controls. PriSEC goes beyond finding the webpages with privacy settings to discovering fine-grained options, presenting them in a searchable, centralized interface, and – most importantly – enforcing them on-demand with minimal user intervention. We overcome the open nature of web development through novel algorithms that leverage the invariant behavior and rendering of webpages. We evaluate the performance of PriSEC to find that it precisely annotates the privacy controls for 94.3% of the control pages in our evaluation set. To demonstrate the usability of PriSEC, we conduct a user study with 148 participants. We show an average reduction of 3.75x in the time taken to adjust privacy settings compared to the baseline system. View details
    Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity
    Isabel Groves
    Amir Saffari
    The 28th International Conference on Computational Linguistics (COLING 2020) (to appear)
    Preview abstract End-to-end neural data-to-text (D2T) generation has recently emerged as an alternative to pipeline-based architectures. However, it has faced challenges in generalizing to new domains and generating semantically consistent text. In this work, we present DATATUNER, a neural, end-to-end data-to-text generation system that makes minimal assumptions about the data representation and the target domain. We take a two-stage generation-reranking approach, combining a fine-tuned language model with a semantic fidelity classifier. Each of our components is learnt end-to-end without the need for dataset-specific heuristics, entity delexicalization, or post-processing. We show that DATATUNER achieves state of the art results on the automated metrics across four major D2T datasets (LDC2017T10, WebNLG, ViGGO, and Cleaned E2E), with a fluency assessed by human annotators nearing or exceeding the human-written reference texts. We further demonstrate that the model-based semantic fidelity scorer in DATATUNER is a better assessment tool compared to traditional, heuristic-based measures. Our generated text has a significantly better semantic fidelity than the state of the art across all four datasets. View details
    The Privacy Policy Landscape after the GDPR
    Thomas Linden
    Rishabh Khandelwal
    Kassem Fawaz
    Proceedings on Privacy Enhancing Technologies (2019)
    The Applications of Machine Learning in Privacy Notice and Choice
    Kassem Fawaz
    Thomas Linden
    2019 11th International Conference on Communication Systems & Networks (COMSNETS)
    280 Birds with One Stone: Inducing Multilingual Taxonomies from Wikipedia using Character-level Classification
    Amit Gupta
    Rémi Lebret
    Karl Aberer
    32nd AAAI Conference on Artificial Intelligence (2018)
    Polisis: Automated Analysis and Presentation of Privacy Policies Using Deep Learning
    Kassem Fawaz
    Rémi Lebret
    Florian Schaub
    Kang G. Shin
    Karl Aberer
    27th USENIX Security Symposium (USENIX Security 18) (2018)
    Taxonomy induction using hypernym subsequences
    Amit Gupta
    Rémi Lebret
    Karl Aberer
    CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management
    "If You Can't Beat them, Join them": A Usability Approach to Interdependent Privacy in Cloud Apps
    Karl Aberer
    Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy (CODASPY '17) (2017)
    PriBots: Conversational Privacy with Chatbots
    Kassem Fawaz
    Kang G. Shin
    Karl Aberer
    Workshop on the Future of Privacy Notices and Indicators, at the Twelfth Symposium on Usable Privacy and Security, SOUPS 2016
    The Curious Case of the PDF Converter that Likes Mozart: Dissecting and Mitigating the Privacy Risk of Personal Cloud Apps
    Rameez Rahman
    Bojan Karlas
    Karl Aberer
    16th Privacy Enhancing Technologies Symposium (PETS 2016)
    Data-Driven Privacy Indicators
    Rameez Rahman
    Karl Aberer
    Workshop On Privacy Indicators, at the Twelfth Symposium on Usable Privacy and Security, SOUPS 2016
    C3P: Context-Aware Crowdsourced Cloud Privacy
    Rameez Rahman
    Karl Aberer
    14th Privacy Enhancing Technologies Symposium (PETS 2014)
    Scalable and Secure Polling in Dynamic Distributed Networks
    Sébastien Gambs
    Rachid Guerraoui
    Florian Huc
    Anne-Marie Kermarrec
    31st IEEE International Symposium on Reliable Distributed Systems (2012)