Information Retrieval and the Web

The science surrounding search engines is commonly referred to as information retrieval, in which algorithmic principles are developed to match user interests to the best information about those interests.

Google started as a result of our founders' attempt to find the best matching between the user queries and Web documents, and do it really fast. During the process, they uncovered a few basic principles: 1) best pages tend to be those linked to the most; 2) best description of a page is often derived from the anchor text associated with the links to a page. Theories were developed to exploit these principles to optimize the task of retrieving the best documents for a user query.

Search and Information Retrieval on the Web has advanced significantly from those early days: 1) the notion of ""information"" has greatly expanded from documents to much richer representations such as images, videos, etc., 2) users are increasingly searching on their Mobile devices with very different interaction characteristics from search on the Desktops; 3) users are increasingly looking for direct information, such as answers to a question, or seeking to complete tasks, such as appointment booking. Through our research, we are continuing to enhance and refine the world's foremost search engine by aiming to scientifically understand the implications of those changes and address new challenges that they bring.

Recent Publications

Websites Need Your Permission Too – User Sentiment and Decision Making on Web Permission Prompts in Desktop Chrome

Marian Harbach

CHI 2024, ACM

(In)Security of File Uploads in Node.js

Harun Oz

Abbas Acar

Ahmet Aris

Güliz Seray Tuncay

Amin Kharraz

Selcuk Uluagac

The Web conference (WWW) (2024)

Beyond Yes and No: Improving Zero-Shot Pointwise LLM Rankers via Scoring Fine-Grained Relevance Labels

Honglei Zhuang

Zhen Qin

Kai Hui

Junru Wu

Le Yan

Xuanhui Wang

Michael Bendersky

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)

Don’t Interrupt Me – A Large-Scale Study of On-Device Permission Prompt Quieting in Chrome

Marian Harbach

Igor Bilogrevic

Enrico Bacis

Serena Chen

Ravjit Uppal

Andy Paicu

Elias Klim

Meggyn Watkins

Balazs Engedy

(2024)

Scaling Up LLM Reviews for Google Ads Content Moderation

Ariel Fuxman

Chih-Chun Chia

Chun-Ta Lu

Dongjin Kwon

Enming Luo

Mehmet Tek

Otilia Stretcu

Ranjay Krishna

Tiantian Fang

Tushar Dogra

Wei Qiao

Yu-Han Lyu

Yuan Wang

(2024)

Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting

Zhen Qin

Rolf Jagerman

Kai Hui

Honglei Zhuang

Junru Wu

Le Yan

Jiaming Shen

Tianqi Liu

Jialu Liu

Don Metzler

Xuanhui Wang

Michael Bendersky

Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) (2024)

Defining the technology of today and tomorrow.

Philosophy

People

Research areas

Foundational ML & Algorithms

Computing Systems & Quantum AI

Science, AI & Society

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Information Retrieval and the Web

Recent Publications

Some of our teams

Join us