Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 433 publications
    Permission Rationales in the Web Ecosystem: An Exploration of Rationale Text and Design Patterns
    Yusra Elbitar
    Soheil Khodayari
    Marian Harbach
    Gianluca De Stefano
    Balazs Engedy
    Giancarlo Pellegrino
    Sven Bugiel
    CHI 2025, ACM
    Preview abstract Modern web applications rely on features like camera and geolocation for personalized experiences, requiring user permission via browser prompts. To explain these requests, applications provide rationales—contextual information on why permissions are needed. Despite their importance, little is known about how rationales appear on the web or their influence on user decisions. This paper presents the first large-scale study of how the web ecosystem handles permission rationales, covering three areas: (i) identifying webpages that use permissions, (ii) detecting and classifying permission rationales, and (iii) analyzing their attributes to understand their impact on user decisions. We examined over 770K webpages from Chrome telemetry, finding 3.6K unique rationale texts and 749 rationale UIs across 85K pages. We extracted key rationale attributes and assessed their effect on user behavior by cross-referencing them with Chrome telemetry data. Our findings reveal nine key insights, providing the first evidence of how different rationales affect user decisions. View details
    Preview abstract The dominant paradigm in image retrieval systems today is to search large databases using global image features, and re-rank those initial results with local image feature matching techniques. This design, dubbed \emph{global-to-local}, stems from the computational cost of local matching approaches, which can only be afforded for a small number of retrieved images. However, emerging efficient local feature search approaches have opened up new possibilities, in particular enabling detailed retrieval at large scale, to find partial matches which are often missed by global feature search. In parallel, global feature-based re-ranking has shown promising results with high computational efficiency. In this work, we leverage these building blocks to introduce a \emph{local-to-global} retrieval paradigm, where efficient local feature search meets effective global feature re-ranking. Critically, we propose a re-ranking method where global features are computed on-the-fly, based on the local feature retrieval similarities. Such re-ranking-only global features, dubbed \emph{similarity embeddings}, leverage multidimensional scaling techniques to create embeddings which respect the local similarities obtained during search, enabling a significant re-ranking boost. Experimentally, we demonstrate unprecedented retrieval performance on the Revisited Oxford and Paris datasets, setting new state-of-the-art results. View details
    Preview abstract In-context Ranking (ICR) is an emerging paradigm for Information Retrieval (IR), which leverages contextual understanding of LLMs by directly incorporating the task description, candidate documents, and the query into the model's input prompt and tasking the LLM to identify relevant document(s). While it is effective, efficiency is a significant challenge in this paradigm, especially as the candidate list grows due to quadratic/super-linear scaling of attention operation with context length. To this end, this paper first identifies inherent and exploitable structures in the attention of LLMs finetuned for ICR: (1) inter-document block sparsity: attention is dense within each document block but sparse across different documents in the context; and (2) query-document block relevance: the attention scores from certain query tokens to a document block in middle layers strongly correlate with that document's actual relevance. Motivated by these observations, we introduce BlockRank (Blockwise In-context Ranking), a novel method that adapts the attention operation in an LLM by (a) architecturally enforcing the observed inter-document block sparsity, reducing attention complexity from quadratic to linear without loss in performance, and (b) optimizing query-document block relevance for true relevant documents during fine-tuning using an auxiliary contrastive training objective, improving retrieval in attention. Experiments on BEIR, MSMarco and NQ with Mistral-7B demonstrate that BlockRank Mistral matches or outperforms existing SOTA listwise rankers and controlled fine-tuned baseline while being significantly more efficient at inference (4.7x for 100 MSMarco documents in context) and scaling gracefully to long-context shortlists, around 500 documents in-context (approximately 100K context length) within a second, presenting a scalable and effective solution for ICR. View details
    Beyond Retrieval: Generating Narratives in Conversational Recommender Systems
    Krishna Sayana
    Raghavendra Vasudeva
    Yuri Vasilevski
    Kun Su
    Liam Hebert
    James Pine
    Hubert Pham
    Ambarish Jash
    Sukhdeep Sodhi
    (2025)
    Preview abstract Large Language Models (LLMs) have shown remarkable progress in generating human-quality text and engaging in complex reasoning. This presents a unique opportunity to revolutionize conversational recommender systems by enabling them to generate rich, engaging and personalized narratives that go beyond recommendations. However, the lack of suitable datasets limits research in this area. This paper addresses this challenge by making two key contributions. First, we introduce REGEN Reviews Enhanced with GEnerative Narratives, a new dataset extending the Amazon Product Reviews with rich user narratives. Furthermore, we perform an extensive automated evaluation of the dataset using a rater LLM. Second, the paper introduces a fusion architecture (CF model with an LLM) which serves as a baseline for REGEN. To the best of our knowledge, this represents the first attempt to analyze the capabilities of LLMs in understanding recommender signals and generating rich narratives. We demonstrate that LLMs can effectively learn from simple fusion architectures utilizing interaction-based CF embeddings, and this can be further enhanced using the metadata and personalization data associated with items. Our experiments show that combining CF and content embeddings leads to improvements of 4-12% in key language metrics compared to using either type of embedding individually. We also provide an analysis to interpret their contributions to this new generative task. View details
    Preview abstract The increasing complexity of cybersecurity and artificial intelligence (AI) executive orders, frameworks, and policies has made translating high-level directives into actionable implementation a persistent challenge. Policymakers, framework authors, and engineering teams often lack a unified approach for interpreting and operationalizing these documents, resulting in inefficiencies, misalignment, and delayed compliance. While existing standards such as the Open Security Controls Assessment Language (OSCAL) address control-level specifications, no standardized, machine-readable format exists for authoring and structuring high-level governance documents. This gap hinders collaboration across disciplines and obscures critical directives’ underlying intent and rationale. This report introduces Governance Schema (GovSCH), an open-source schema designed to standardize the authoring and translation of cybersecurity and AI governance documents into a consistent, machine-readable format. By analyzing prior executive orders, regulatory frameworks, and policies, GovSCH identifies common structures and authoring practices to create an interoperable model that bridges policymakers, regulatory framework authors, and engineering teams. This approach enables more precise articulation of policy intent, improves transparency, and accelerates the technical implementation of regulations. Ultimately, GovSCH aims to enhance collaboration, standardization, and efficiency in cybersecurity and AI governance. View details
    Preview abstract A recent large-scale experiment conducted by Chrome has demonstrated that a "quieter" web permission prompt can reduce unwanted interruptions while only marginally affecting grant rates. However, the experiment and the partial roll-out were missing two important elements: (1) an effective and context-aware activation mechanism for such a quieter prompt, and (2) an analysis of user attitudes and sentiment towards such an intervention. In this paper, we address these two limitations by means of a novel ML-based activation mechanism -- and its real-world on-device deployment in Chrome -- and a large-scale user study with 13.1k participants from 156 countries. First, the telemetry-based results, computed on more than 20 million samples from Chrome users in-the-wild, indicate that the novel on-device ML-based approach is both extremely precise (>99% post-hoc precision) and has very high coverage (96% recall for notifications permission). Second, our large-scale, in-context user study shows that quieting is often perceived as helpful and does not cause high levels of unease for most respondents. View details
    Preview abstract The web utilizes permission prompts to moderate access to certain capabilities. We present the first investigation of user behavior and sentiment of this security and privacy measure on the web, using 28 days of telemetry data from more than 100M Chrome installations on desktop platforms and experience sampling responses from 25,706 Chrome users. Based on this data, we find that ignoring and dismissing permission prompts are most common for geolocation and notifications. Permission prompts are perceived as more annoying and interrupting when they are not allowed, and most respondents cite a rational reason for the decision they took. Our data also supports that the perceived availability of contextual information from the requesting website is associated with allowing access to a requested capability. More usable permission controls could facilitate adoption of best practices that address several of the identified challenges; and ultimately could lead to better user experiences and a safer web. View details
    Scaling Up LLM Reviews for Google Ads Content Moderation
    Ariel Fuxman
    Chih-Chun Chia
    Dongjin Kwon
    Enming Luo
    Mehmet Tek
    Ranjay Krishna
    Tiantian Fang
    Tushar Dogra
    Yu-Han Lyu
    (2024)
    Preview abstract Large language models (LLMs) are powerful tools for content moderation but LLM inference costs and latency on large volumes of data, such as the Google Ads repository, are prohibitive for their casual usage. This study is focused on scaling up LLM reviews for content moderation in Google Ads. First, we use heuristics to select candidates via filtering and duplicate removal, and create clusters of ads for which we select one representative ad per cluster. Then, LLMs are used to review only the representative ads. Finally we propagate the LLM decisions for representative ads back to their clusters. This method reduces the number of reviews by more than 3 orders of magnitude while achieving a 2x recall compared to a non-LLM model as a baseline. Note that, the success of this approach is a strong function of the representations used in clustering and label propagation; we observed that cross-modal similarity representations yield better results than uni-modal representations. View details
    (In)Security of File Uploads in Node.js
    Harun Oz
    Abbas Acar
    Ahmet Aris
    Amin Kharraz
    Selcuk Uluagac
    The Web conference (WWW) (2024)
    Preview abstract File upload is a critical feature incorporated by a myriad of web applications to enable users to share and manage their files conveniently. It has been used in many useful services such as file-sharing and social media. While file upload is an essential component of web applications, the lack of rigorous checks on the file name, type, and content of the uploaded files can result in security issues, often referred to as Unrestricted File Upload (UFU). In this study, we analyze the (in)security of popular file upload libraries and real-world applications in the Node.js ecosystem. To automate our analysis, we propose NodeSec– a tool designed to analyze file upload insecurities in Node.js applications and libraries. NodeSec generates unique payloads and thoroughly evaluates the application’s file upload security against 13 distinct UFU-type attacks. Utilizing NodeSec, we analyze the most popular file upload libraries and real-world ap- plications in the Node.js ecosystem. Our results reveal that some real-world web applications are vulnerable to UFU attacks and dis- close serious security bugs in file upload libraries. As of this writing, we received 19 CVEs and two US-CERT cases for the security issues that we reported. Our findings provide strong evidence that the dynamic features of Node.js applications introduce security shortcomings and that web developers should be cautious when implementing file upload features in their applications. View details
    Beyond Yes and No: Improving Zero-Shot Pointwise LLM Rankers via Scoring Fine-Grained Relevance Labels
    Michael Bendersky
    Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)
    Preview abstract Zero-shot text rankers powered by recent LLMs achieve remarkable ranking performance by simply prompting. Existing prompts for pointwise LLM rankers mostly ask the model to choose from binary relevance labels like "Yes" and "No". However, the lack of intermediate relevance label options may cause the LLM to provide noisy or biased answers for documents that are partially relevant to the query. We propose to incorporate fine-grained relevance labels into the prompt for LLM rankers, enabling them to better differentiate among documents with different levels of relevance to the query and thus derive a more accurate ranking. We study two variants of the prompt template, coupled with different numbers of relevance levels. Our experiments on 8 BEIR data sets show that adding fine-grained relevance labels significantly improves the performance of LLM rankers. View details
    Preview abstract Ranking documents using Large Language Models (LLMs) by directly feeding the query and candidate documents into the prompt is an interesting and practical problem. However, researchers have found it difficult to outperform fine-tuned baseline rankers on benchmark datasets. We analyze pointwise and listwise ranking prompts used by existing methods and argue that off-the-shelf LLMs do not fully understand these challenging ranking formulations. In this paper, we propose to significantly reduce the burden on LLMs by using a new technique called Pairwise Ranking Prompting (PRP). Our results are the first in the literature to achieve state-of-the-art ranking performance on standard benchmarks using moderate-sized open-sourced LLMs. On TREC-DL 2019&2020, PRP based on the Flan-UL2 model with 20B parameters performs favorably with the previous best approach in the literature, which is based on the blackbox commercial GPT-4 that has 50x (estimated) model size, while outperforming other LLM-based solutions, such as InstructGPT which has 175B parameters, by over 10% for all ranking metrics. By using the same prompt template on seven BEIR tasks, PRP outperforms supervised baselines and outperforms the blackbox commercial ChatGPT solution by 4.2% and pointwise LLM-based solutions by more than 10% on average NDCG@10. Furthermore, we propose several variants of PRP to improve efficiency and show that it is possible to achieve competitive results even with linear complexity. View details
    Preview abstract Browser fingerprinting is often associated with cross-site user tracking, a practice that many browsers (e.g., Safari, Brave, Edge, Firefox and Chrome) want to block. However, less is publicly known about its uses to enhance online safety, where it can provide an additional security layer against service abuses (e.g., in combination with CAPTCHAs) or during user authentication. To the best of our knowledge, no fingerprinting defenses deployed thus far consider this important distinction when blocking fingerprinting attempts, so they might negatively affect website functionality and security. To address this issue we make three main contributions. First, we propose and evaluate a novel machine learning-based method to automatically identify authentication pages (i.e. sign-in and sign-up pages). Our algorithm -- which relies on a hybrid unsupervised/supervised approach -- achieves 96-98% precision and recall on a large, manually-labelled dataset of 10,000 popular sites. Second, we compare our algorithm with other methods from prior works on the same dataset, showing that it significantly outperforms all of them (+83% F1-score). Third, we quantify the prevalence of fingerprinting scripts across sign-in and sign-up pages (9.2%) versus those executed on other pages (8.9%); while the rates of fingerprinting are similar, home pages and authentication pages differ in the third-party scripts they include and how often these scripts are labeled as tracking. We also highlight the substantial differences in fingerprinting behavior on login and sign-up pages. Our work sheds light on the complicated reality that fingerprinting is used to both protect user security and invade user privacy, and that this dual nature must be considered by fingerprinting mitigations. View details
    Can Query Expansion Improve Generalization of Strong Cross-Encoder Rankers?
    Minghan Li
    Jimmy Lin
    Michael Bendersky
    Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24) (2024)
    Preview abstract Query expansion has been widely used to improve the search results of first-stage retrievers, yet its influence on second-stage, crossencoder rankers remains under-explored. A recent study shows that current expansion techniques benefit weaker models but harm stronger rankers. In this paper, we re-examine this conclusion and raise the following question: Can query expansion improve generalization of strong cross-encoder rankers? To answer this question, we first apply popular query expansion methods to different crossencoder rankers and verify the deteriorated zero-shot effectiveness. We identify two vital steps in the experiment: high-quality keyword generation and minimally-disruptive query modification. We show that it is possible to improve the generalization of a strong neural ranker, by generating keywords through a reasoning chain and aggregating the ranking results of each expanded query via selfconsistency, reciprocal rank weighting, and fusion. Experiments on BEIR and TREC Deep Learning 2019/2020 show that the nDCG@10 scores of both MonoT5 and RankT5 following these steps are improved, which points out a direction for applying query expansion to strong cross-encoder rankers. View details
    A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with Large Language Models
    Shengyao Zhuang
    Bevan Koopman
    Guido Zuccon
    Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24) (2024)
    Preview abstract We propose a novel zero-shot document ranking approach based on Large Language Models (LLMs): the Setwise prompting approach. Our approach complements existing prompting approaches for LLM-based zero-shot ranking: Pointwise, Pairwise, and Listwise. Through the first-of-its-kind comparative evaluation within a consistent experimental framework and considering factors like model size, token consumption, latency, among others, we show that existing approaches are inherently characterised by trade-offs between effectiveness and efficiency. We find that while Pointwise approaches score high on efficiency, they suffer from poor effectiveness. Conversely, Pairwise approaches demonstrate superior effectiveness but incur high computational overhead. Our Setwise approach, instead, reduces the number of LLM inferences and the amount of prompt token consumption during the ranking procedure, compared to previous methods. This significantly improves the efficiency of LLM-based zero-shot ranking, while also retaining high zero-shot ranking effectiveness. We make our code and results publicly available at https://github.com/ielab/llm-rankers. View details
    Statistical Analysis of Cardiovascular Diseases Dataset of BRFSS
    Ashank Anshuman
    Aakarshit Uppal
    Indrajit Mukherjee
    Open Access Library Journal, 11 (2024)
    Preview abstract Cardiovascular Diseases (CVDs) remain a leading cause of death in the United States. These diseases, including coronary heart disease, heart attack, and stroke, pose significant health risks. Accurate prediction of CVD probability can aid in prevention and management. To address this challenge, we analyzed data from the Behavioral Risk Factor Surveillance System (BRFSS) spanning 1995-2017. We developed innovative methods to handle missing data and normalize values. Deep learning models were employed to predict risk factors and, subsequently, the likelihood of CVDs. Our models were implemented using TensorFlow and trained on a high-performance computing server. The models accurately predicted risk factors with over 90% accuracy, enabling targeted interventions. We successfully predicted CVD probability with greater than 95% accuracy, providing valuable insights for healthcare providers. An online portal was developed to forecast CVD trends over the next 31 years, facilitating proactive planning and resource allocation. View details