Information Retrieval and the Web

The science surrounding search engines is commonly referred to as information retrieval, in which algorithmic principles are developed to match user interests to the best information about those interests.

Google started as a result of our founders' attempt to find the best matching between the user queries and Web documents, and do it really fast. During the process, they uncovered a few basic principles: 1) best pages tend to be those linked to the most; 2) best description of a page is often derived from the anchor text associated with the links to a page. Theories were developed to exploit these principles to optimize the task of retrieving the best documents for a user query.

Search and Information Retrieval on the Web has advanced significantly from those early days: 1) the notion of ""information"" has greatly expanded from documents to much richer representations such as images, videos, etc., 2) users are increasingly searching on their Mobile devices with very different interaction characteristics from search on the Desktops; 3) users are increasingly looking for direct information, such as answers to a question, or seeking to complete tasks, such as appointment booking. Through our research, we are continuing to enhance and refine the world's foremost search engine by aiming to scientifically understand the implications of those changes and address new challenges that they bring.

Recent Publications

Preview abstract In-context Ranking (ICR) is an emerging paradigm for Information Retrieval (IR), which leverages contextual understanding of LLMs by directly incorporating the task description, candidate documents, and the query into the model's input prompt and tasking the LLM to identify relevant document(s). While it is effective, efficiency is a significant challenge in this paradigm, especially as the candidate list grows due to quadratic/super-linear scaling of attention operation with context length. To this end, this paper first identifies inherent and exploitable structures in the attention of LLMs finetuned for ICR: (1) inter-document block sparsity: attention is dense within each document block but sparse across different documents in the context; and (2) query-document block relevance: the attention scores from certain query tokens to a document block in middle layers strongly correlate with that document's actual relevance. Motivated by these observations, we introduce BlockRank (Blockwise In-context Ranking), a novel method that adapts the attention operation in an LLM by (a) architecturally enforcing the observed inter-document block sparsity, reducing attention complexity from quadratic to linear without loss in performance, and (b) optimizing query-document block relevance for true relevant documents during fine-tuning using an auxiliary contrastive training objective, improving retrieval in attention. Experiments on BEIR, MSMarco and NQ with Mistral-7B demonstrate that BlockRank Mistral matches or outperforms existing SOTA listwise rankers and controlled fine-tuned baseline while being significantly more efficient at inference (4.7x for 100 MSMarco documents in context) and scaling gracefully to long-context shortlists, around 500 documents in-context (approximately 100K context length) within a second, presenting a scalable and effective solution for ICR. View details
Preview abstract The increasing complexity of cybersecurity and artificial intelligence (AI) executive orders, frameworks, and policies has made translating high-level directives into actionable implementation a persistent challenge. Policymakers, framework authors, and engineering teams often lack a unified approach for interpreting and operationalizing these documents, resulting in inefficiencies, misalignment, and delayed compliance. While existing standards such as the Open Security Controls Assessment Language (OSCAL) address control-level specifications, no standardized, machine-readable format exists for authoring and structuring high-level governance documents. This gap hinders collaboration across disciplines and obscures critical directives’ underlying intent and rationale. This report introduces Governance Schema (GovSCH), an open-source schema designed to standardize the authoring and translation of cybersecurity and AI governance documents into a consistent, machine-readable format. By analyzing prior executive orders, regulatory frameworks, and policies, GovSCH identifies common structures and authoring practices to create an interoperable model that bridges policymakers, regulatory framework authors, and engineering teams. This approach enables more precise articulation of policy intent, improves transparency, and accelerates the technical implementation of regulations. Ultimately, GovSCH aims to enhance collaboration, standardization, and efficiency in cybersecurity and AI governance. View details
Preview abstract The dominant paradigm in image retrieval systems today is to search large databases using global image features, and re-rank those initial results with local image feature matching techniques. This design, dubbed \emph{global-to-local}, stems from the computational cost of local matching approaches, which can only be afforded for a small number of retrieved images. However, emerging efficient local feature search approaches have opened up new possibilities, in particular enabling detailed retrieval at large scale, to find partial matches which are often missed by global feature search. In parallel, global feature-based re-ranking has shown promising results with high computational efficiency. In this work, we leverage these building blocks to introduce a \emph{local-to-global} retrieval paradigm, where efficient local feature search meets effective global feature re-ranking. Critically, we propose a re-ranking method where global features are computed on-the-fly, based on the local feature retrieval similarities. Such re-ranking-only global features, dubbed \emph{similarity embeddings}, leverage multidimensional scaling techniques to create embeddings which respect the local similarities obtained during search, enabling a significant re-ranking boost. Experimentally, we demonstrate unprecedented retrieval performance on the Revisited Oxford and Paris datasets, setting new state-of-the-art results. View details
Permission Rationales in the Web Ecosystem: An Exploration of Rationale Text and Design Patterns
Yusra Elbitar
Soheil Khodayari
Gianluca De Stefano
Balazs Engedy
Giancarlo Pellegrino
Sven Bugiel
CHI 2025, ACM
Preview abstract Modern web applications rely on features like camera and geolocation for personalized experiences, requiring user permission via browser prompts. To explain these requests, applications provide rationales—contextual information on why permissions are needed. Despite their importance, little is known about how rationales appear on the web or their influence on user decisions. This paper presents the first large-scale study of how the web ecosystem handles permission rationales, covering three areas: (i) identifying webpages that use permissions, (ii) detecting and classifying permission rationales, and (iii) analyzing their attributes to understand their impact on user decisions. We examined over 770K webpages from Chrome telemetry, finding 3.6K unique rationale texts and 749 rationale UIs across 85K pages. We extracted key rationale attributes and assessed their effect on user behavior by cross-referencing them with Chrome telemetry data. Our findings reveal nine key insights, providing the first evidence of how different rationales affect user decisions. View details
Beyond Retrieval: Generating Narratives in Conversational Recommender Systems
Krishna Sayana
Raghavendra Vasudeva
Yuri Vasilevski
Kun Su
Liam Hebert
James Pine
Hubert Pham
Ambarish Jash
Sukhdeep Sodhi
(2025)
Preview abstract Large Language Models (LLMs) have shown remarkable progress in generating human-quality text and engaging in complex reasoning. This presents a unique opportunity to revolutionize conversational recommender systems by enabling them to generate rich, engaging and personalized narratives that go beyond recommendations. However, the lack of suitable datasets limits research in this area. This paper addresses this challenge by making two key contributions. First, we introduce REGEN Reviews Enhanced with GEnerative Narratives, a new dataset extending the Amazon Product Reviews with rich user narratives. Furthermore, we perform an extensive automated evaluation of the dataset using a rater LLM. Second, the paper introduces a fusion architecture (CF model with an LLM) which serves as a baseline for REGEN. To the best of our knowledge, this represents the first attempt to analyze the capabilities of LLMs in understanding recommender signals and generating rich narratives. We demonstrate that LLMs can effectively learn from simple fusion architectures utilizing interaction-based CF embeddings, and this can be further enhanced using the metadata and personalization data associated with items. Our experiments show that combining CF and content embeddings leads to improvements of 4-12% in key language metrics compared to using either type of embedding individually. We also provide an analysis to interpret their contributions to this new generative task. View details
Rankers, Judges, and Assistants: Towards Understanding the Interplay of LLMs in Information Retrieval Evaluation
Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (2025)
Preview abstract Large language models (LLMs) are increasingly integral to information retrieval (IR), powering ranking, evaluation, and AI-assisted content creation. This widespread adoption necessitates a critical examination of potential biases arising from the interplay between these LLM-based components. This paper synthesizes existing research and presents novel experiment designs that explore how LLM-based rankers and assistants influence LLM-based judges. We provide the first empirical evidence of LLM judges exhibiting significant bias towards LLM-based rankers. Furthermore, we observe limitations in LLM judges' ability to discern subtle system performance differences. Contrary to some previous findings, our preliminary study does not find evidence of bias against AI-generated content. These results highlight the need for a more holistic view of the LLM-driven information ecosystem. To this end, we offer initial guidelines and a research agenda to ensure the reliable use of LLMs in IR evaluation. View details
×