Toward Whole-session Relevance: Exploring Intrinsic Diversity in Web Search
Abstract
Current research on web search has focused on optimizing and evaluating single queries. However, a significant fraction of user queries are part of more complex tasks [20] which span multiple queries across one or more search sessions [26,24]. An ideal search engine would not only retrieve relevant results for a user's particular query but also be able to identify when the user is engaged in a more complex task and aid the user in completing that task [29,1]. Toward optimizing whole-session or task relevance, we characterize and address the problem of intrinsic diversity (ID) in retrieval [30], a type of complex task that requires multiple interactions with current search engines. Unlike existing work on extrinsic diversity [30] that deals with ambiguity in intent across multiple users, ID queries often have little ambiguity in intent but seek content covering a variety of aspects on a shared theme. In such scenarios, the underlying needs are typically exploratory, comparative, or breadth-oriented in nature. We identify and address three key problems for ID retrieval: identifying authentic examples of ID tasks from post-hoc analysis of behavioral signals in search logs; learning to identify initiator queries that mark the start of an ID search task; and given an initiator query, predicting which content to prefetch and rank.