Jump to Content
Filip Radlinski

Filip Radlinski

Filip Radlinski is a research scientist at Google in London, UK. He received his PhD from Cornell University and a BSc (Hons) from the Australian National University. His research interests include conversational search and recommendation, online evaluation and machine learning. More of his publications are listed on Google Scholar.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, desc
  • Year
  • Year, desc
    Measuring the Impact of Explanation Bias: A Study of Natural Language Justifications for Recommender Systems
    Andrey Petrov
    Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems (CHI EA '23), ACM
    Preview abstract Despite the potential impact of explanations on decision making, there is a lack of research on quantifying the effect of explanations on users' choices. This paper presents an experimental protocol for measuring the degree to which positively or negatively biased explanations can lead to users choosing suboptimal recommendations. Key elements of this protocol include a preference elicitation stage to allow for personalizing recommendations, manual identification and extraction of item aspects from reviews, and a controlled method for introducing bias through the combination of both positive and negative aspects. We also present explanations in two different textual formats: as a list of item aspects and as fluent natural language text. Through a user study with 129 participants, we demonstrate that explanations can significantly affect users' selections and that these findings generalize across explanation formats. View details
    Preview abstract Traditional recommender systems leverage users' item preference history to recommend novel content that users may like. However, dialog interfaces that allow users to express language-based preferences offer a fundamentally different modality for preference input. Inspired by recent successes of prompting paradigms for large language models (LLMs), we study their use for making recommendations from both item-based and language-based preferences in comparison to state-of-the-art item-based collaborative filtering (CF) methods. To support this investigation, we collect a new dataset consisting of both item-based and language-based preferences elicited from users along with their ratings on a variety of (biased) recommended items and (unbiased) random items. Among numerous experimental results, we find that LLMs provide competitive recommendation performance for pure language-based preferences (no item preferences) in the near cold-start case in comparison to item-based CF methods, despite having no supervised training for this specific task (zero-shot) or only a few labels (few-shot). This is particularly promising as language-based preference representations are more explainable and scrutable than item-based or vector-based representations. View details
    Beyond Single Items: Exploring User Preferences in Item Sets with the Conversational Playlist Curation Dataset
    Arun Chaganty
    Megan Leszczynski
    Shu Zhang
    Ravi Ganti
    Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (2023)
    Preview abstract Users in consumption domains, like music, are often able to more efficiently provide preferences over a set of items (e.g. a playlist or radio) than over single items (e.g. songs). Unfortunately, this is an underexplored area of research, with most existing recommendation systems limited to understanding preferences over single items. Curating an item set exponentiates the search space that recommender systems must consider (all subsets of items!): this motivates conversational approaches---where users explicitly state or refine their preferences and systems elicit preferences in natural language---as an efficient way to understand user needs. We call this task conversational item set curation and present a novel data collection methodology that efficiently collects realistic preferences about item sets in a conversational setting by observing both item-level and set-level feedback. We apply this methodology to music recommendation to build the Conversational Playlist Curation Dataset (CPCD), where we show that it leads raters to express preferences that would not be otherwise expressed. Finally, we propose a wide range of conversational retrieval models as baselines for this task and evaluate them on the dataset. View details
    Conversational Information Seeking
    Hamed Zamani
    Johanne R. Trippas
    Jeff Dalton
    Foundations and Trends® in Information Retrieval (2023), pp. 244-456
    Preview abstract Conversational information seeking (CIS) is concerned with a sequence of interactions between one or more users and an information system. Interactions in CIS are primarily based on natural language dialogue, while they may include other types of interactions, such as click, touch, and body gestures. This monograph provides a thorough overview of CIS definitions, applications, interactions, interfaces, design, implementation, and evaluation. This monograph views CIS applications as including conversational search, conversational question answering, and conversational recommendation. Our aim is to provide an overview of past research related to CIS, introduce the current state-of-the-art in CIS, highlight the challenges still being faced in the community, and suggest future directions. View details
    Resolving Indirect Referring Expressions for Entity Selection
    Silvia Pareti
    Proceedings of the Annual Meetings of the Association for Computational Linguistics (ACL 2023)
    Preview abstract Recent advances in language modeling have enabled new conversational systems. In particular, it is often desirable for people to make choices among specified options when using such systems. We address the problem of reference resolution, when people use natural expressions to choose between real world entities. For example, given the choice `Should we make a Simnel cake or a Pandan cake?' a natural response from a non-expert may be indirect: `let's make the green one'. Such natural expressions have been little studied for reference resolution. We argue that robustly understanding such language has large potential for improving naturalness in dialog, recommendation, and search systems. We create AltEntities (Alternative Entities), a new public dataset of 42K entity pairs and expressions (referring to one entity in the pair), and develop models for the disambiguation problem. Consisting of indirect referring expressions across three domains, our corpus enables for the first time the study of how language models can be adapted to this task. We find they achieve 82%-87% accuracy in realistic settings, which while reasonable also invites further advances. View details
    Subjective Attributes in Conversational Recommendation Systems: Challenges and Opportunities
    Ivan Vendrov
    Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI-22) (2022), pp. 12287-12293
    Preview abstract The ubiquity of recommender systems has increased the need for higher-bandwidth, natural and efficient communication with users. This need is increasingly filled by recommenders that support natural language interaction, often conversationally. Given the inherent semantic subjectivity present in natural language, we argue that modeling subjective attributes in recommenders is a critical, yet understudied, avenue of AI research. We propose a novel framework for understanding different forms of subjectivity, examine various recommender tasks that will benefit from a systematic treatment of subjective attributes, and outline a number of research challenges. View details
    Conversational Music Retrieval with Synthetic Data
    Megan Eileen Leszczynski
    Ravi Ganti
    Shu Zhang
    Arun Tejasvi Chaganty
    Second Workshop on Interactive Learning for Natural Language Processing at NeurIPS 2022
    Preview abstract Users looking for recommendations often wish to improve suggestions through broad natural language feedback (e.g., “How about something more upbeat?”). However, building such conversational retrieval systems requires conversational data with rich user utterances paired with slates of items that cover a diverse range of preferences. This is challenging to collect scalably using conventional methods like crowd-sourcing. We address this problem with a new technique to synthesize high-quality dialog data by transforming the domain expertise encoded in curated item collections into corresponding item-seeking conversations. The method first generates a sequence of hypothetical slates returned by a system, and then uses a language model to introduce corresponding user utterances. We apply the approach on a dataset of curated music playlists to generate 10k diverse music-seeking conversations. A qualitative human evaluation shows that a majority of these conversations express believable sequences of slates and include user utterances that faithfully express preferences for them. When used to train a conversational retrieval model, the synthetic data yields up to a 23% relative gain on standard retrieval metrics compared to baselines trained on non-conversational and conversational datasets. View details
    On Natural Language User Profiles for Transparent and Scrutable Recommendation
    Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22) (2022)
    Preview abstract Natural interaction with recommendation and personalized search systems has received tremendous attention in recent years. We focus on the challenge of supporting people's understanding and control of these systems and explore a fundamentally new way of thinking about representation of knowledge in recommendation and personalization systems. Specifically, we argue that it may be both desirable and possible for algorithms that use natural language representations of users' preferences to be developed. We make the case that this could provide significantly greater transparency, as well as affordances for practical actionable interrogation of, and control over, recommendations. Moreover, we argue that such an approach, if successfully applied, may enable a major step towards systems that rely less on noisy implicit observations while increasing portability of knowledge of one's interests. View details
    Disentangling Preference Representations for Recommendation Critiquing with β-VAE
    30th ACM International Conference on Information and Knowledge Management (CIKM 2021), ACM, New York
    Preview abstract Modern recommender systems usually embed users and items into a learned vector space representation. Similarity in this space is used to generate recommendations, and recommendation methods are agnostic to the structure of the embedding space. Motivated by the need for recommendation systems to be more transparent and controllable, we postulate that it is beneficial to assign meaning to some of the dimensions of user and item representations. Disentanglement is one technique commonly used for this purpose. We present a novel supervised disentangling approach for recommendation tasks. Our model learns embeddings where attributes of interest are disentangled, while requiring only a very small number of labeled items at training time. The model can then generate interactive and critiquable recommendations for all users, without requiring any labels at recommendation time, and without sacrificing any recommendation performance. Our approach thus provides users with levers to manipulate, critique and fine-tune recommendations, and gives insight into why particular recommendations are made. Given only user-item interactions at recommendation time, we show that it identifies user tastes with respect to the attributes that have been disentangled, allowing for users to manipulate recommendations across these attributes. View details
    Soliciting User Preferences in Conversational Recommender Systems via Usage-related Questions
    Ivica Kostric
    Krisztian Balog
    Proceedings of ACM Conference on Recommender Systems (RecSys ’21) (2021)
    Preview abstract A key distinguishing feature of conversational recommender systems over traditional recommender systems is their ability to elicit user preferences using natural language. Currently, the predominant approach to preference elicitation is to ask questions directly about items or item attributes. These strategies do not perform well in cases where the user does not have sufficient knowledge of the target domain to answer such questions. Conversely, in a shopping setting, talking about the planned use of items does not present any difficulties, even for those that are new to a domain. In this paper, we propose a novel approach to preference elicitation by asking implicit questions based on item usage. Our approach consists of two main steps. First, we identify the sentences from a large review corpus that contain information about item usage. Then, we generate implicit preference elicitation questions from those sentences using a neural text-to-text model. The main contributions of this work also include a multi-stage data annotation protocol using crowdsourcing for collecting high-quality labeled training data for the neural model. We show that out approach is effective in selecting review sentences and transforming them to elicitation questions, even with limited training data. View details
    Towards Unified Metrics for Accuracy and Diversity for Recommender Systems
    Javier Parapar
    Proceedings of ACM Conference on Recommender Systems (RecSys ’21) (2021)
    Preview abstract Recommender systems evaluation has evolved rapidly in recent years. However, for offline evaluation, accuracy is the de facto standard for assessing the superiority of one method over another, with most research comparisons focused on tasks ranging from rating prediction to ranking metrics for top-n recommendation. Simultaneously, recommendation diversity and novelty have become recognized as critical to users' perceived utility, with several new metrics recently proposed for evaluating these aspects of recommendation lists. Consequently, the accuracy-diversity dilemma frequently shows up as a choice to make when creating new recommendation algorithms. We propose a novel adaptation of a unified metric, derived from one commonly used for search system evaluation, to Recommender Systems. The proposed metric combines topical diversity and accuracy, and we show it to satisfy a set of desired properties that we formulate axiomatically. These axioms are defined as fundamental constraints that a good unified metric should always satisfy. Moreover, beyond the axiomatic analysis, we present an experimental evaluation of the metric with collaborative filtering data. Our analysis shows that the metric respects the desired theoretical constraints and behaves as expected when performing offline evaluation. View details
    On Interpretation and Measurement of Soft Attributes for Recommendation
    Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '21) (2021)
    Preview abstract We address how to robustly interpret natural language refinements (or critiques) in recommender systems. In particular, in human-human recommendations settings people frequently use soft attributes to express preferences about items, including concepts like the originality of a movie plot, the noisiness of a venue, or the complexity of a recipe. While binary tagging is extensively studied in the context of recommender systems, soft attributes often involve subjective and contextual aspects, which cannot be captured reliably in this way, nor be represented as objective binary truth in a knowledge base. This also adds important considerations when measuring soft attribute ranking. We propose a more natural representation as personalized relative statements, rather than as absolute item properties. We present novel data collection techniques and evaluation approaches, and a new public dataset. We also propose a set of scoring approaches, from unsupervised to weakly supervised to fully supervised, as a step towards interpreting and acting upon soft attribute based critiques. View details
    Diverse User Preference Elicitation with Multi-Armed Bandits
    Javier Parapar
    Proceedings of the ACM international Conference on Web Search and Data Mining (WSDM) (2021)
    Preview abstract Personalized recommender systems rely on knowledge of user preferences to produce recommendations. While those preferences are often obtained from past user interactions with the recommendation catalog, in some situations such observations are insufficient or unavailable. The most widely studied case is with new users, although other similar situations arise where explicit preference elicitation is valuable. At the same time, a seemingly disparate challenge is that there is a well known popularity bias in many algorithmic approaches to recommender systems. The most common way of addressing this challenge is diversification, which tends to be applied to the output of a recommender algorithm, prior to items being presented to users. We tie these two problems together, showing a tight relationship. Our results show that popularity bias in preference elicitation contributes to popularity bias in recommendation. In particular, most elicitation methods directly optimize only for the relevance of recommendations that would result from collected preferences. This focus on recommendation accuracy biases the preferences collected. We demonstrate how diversification can instead be applied directly at elicitation time. Our model diversifies the preferences elicited using Multi-Armed Bandits, a classical exploration-exploitation framework from reinforcement learning. This leads to a broader understanding of users' preferences, and improved diversity and serendipity of recommendations, without necessitating post-hoc debiasing corrections. View details
    "I’d rather just go to bed”: Understanding Indirect Answers
    Dan Roth
    Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) (2020)
    Preview abstract We revisit a pragmatic inference problem in dialog: Understanding indirect responses to questions. Humans can interpret `I'm starving.' in response to `Hungry?', even without direct cue words such as `yes' and `no'. In dialog systems, allowing natural responses rather than closed vocabularies would be similarly beneficial. However, today's systems are only as sensitive to these pragmatic moves as their language model allows. We create and release the first large-scale English language corpus `Circa' with 34,268 (polar question, indirect answer) pairs to enable progress on this task. The data was collected via elaborate crowd-sourcing, and contains utterances with yes/no meaning, as well as uncertain, middle-ground, and conditional responses. We also present BERT-based neural models to predict such categories for a question-answer pair. We find that while transfer learning from entailment works reasonably, performance is not yet sufficient for robust dialog. Our models reach 82-88% accuracy for a 4-class distinction, and 74-85% for 6 classes. View details
    Measuring Recommendation Explanation Quality: The Conflicting Goals of Explanations
    Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '20) (2020)
    Preview abstract Explanations have a large effect on how people respond to recommendations.  However, there are many possible intentions a system may have in generating explanations for a given recommendation - from increasing transparency, to enabling a faster decision, to persuading the recipient.  As a good explanation for one goal may not be good for others, we address the questions of (1) how to robustly measure if an explanation meets a given goal and (2) how the different goals interact with each other.  Specifically, this paper presents a first proposal of how to measure the quality of explanations along seven common goal dimensions catalogued in the literature.  We find that the seven goals are not independent, but rather exhibit strong structure.  Proposing two novel explanation evaluation designs, we identify challenges in evaluation, and provide more efficient measurement approaches of explanation quality. View details
    Common Conversational Community Prototype: Scholarly Conversational Assistant
    Krisztian Balog
    Lucie Flekova
    Matthias Hagen
    Rosie Jones
    Martin Potthast
    Mark Sanderson
    Svitlana Vakulenko
    Hamed Zamani
    (2020)
    Preview abstract This paper discusses the potential for creating academic resources (tools, data, and evaluation approaches) to support research in conversational search, by focusing on realistic information needs and conversational interactions. Specifically, we propose to develop and operate a prototype conversational search system for scholarly activities. This Scholarly Conversational Assistant would serve as a useful tool, a means to create datasets, and a platform for running evaluation challenges by groups across the community. This article results from discussions of a working group at Dagstuhl Seminar 19461 on Conversational Search. View details
    Transparent, Scrutable and Explainable User Models for Personalized Recommendation
    Shushan Arakelyan
    Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '19) (2019)
    Preview abstract Most recommender systems base their recommendations on implicit or explicit item-level feedback provided by users. These item ratings are combined into a complex user model, which then predicts the suitability of other items. While effective, such methods have limited scrutability and transparency. For instance, if a user's interests change, then many item ratings would usually need to be modified to significantly shift the user's recommendations. Similarly, explaining how the system characterizes the user is impossible, short of presenting the entire list of known item ratings. In this paper, we present a new set-based recommendation technique that permits the user model to be explicitly presented to users in natural language, empowering users to understand recommendations made and improve the recommendations dynamically. While performing comparably to traditional collaborative filtering techniques in a standard static setting, our approach allows users to efficiently improve recommendations. Further, it makes it easier for the model to be validated and adjusted, building user trust and understanding. View details
    Preview abstract Conversational recommendation has recently attracted significant attention. As systems must understand users' preferences, training them has called for conversational corpora, typically derived from task-oriented conversations. We observe that such corpora often do not reflect how people naturally describe preferences. We present a new approach to obtaining user preferences in dialogue: Coached Conversational Preference Elicitation. It allows collection of natural yet structured conversational preferences. Studying the dialogues in one domain, we present a brief quantitative analysis of how people describe movie preferences at scale. Demonstrating the methodology, we release the CCPE-M dataset to the community with over 500 movie preference dialogues expressing over 10,000 preferences. View details
    Embedding Search into a Conversational Platform to Support Collaborative Search
    Sandeep Avula
    Jaime Arguello
    Robert Capra
    Jordan Dodson
    Yuhui Huang
    Proceedings of the 2019 Conference on Human Information Interaction and Retrieval (CHIIR'19), pp. 15-23
    Preview abstract Popular messaging platforms such as Slack have given rise to thousands of applications (or bots) that users can engage with individually or as a group. In this paper, we study the use of searchbots (i.e., bots that perform specific types of searches) during collaborative information-seeking tasks mediated through Slack. We report on a user study in which 27 pairs of participants were exposed to three searchbot conditions (a within-subjects design). In the first condition, participants completed the task by searching independently and coordinating through Slack (no searchbot). In the second condition, participants could only search inside of Slack using the searchbot. In the third condition, participants could both search inside of Slack using the searchbot and outside of Slack using their own independent search interfaces. We investigate four research questions focusing on the influence of the searchbot condition on outcomes associated with: (RQ1) participants' levels of workload, (RQ2) collaborative awareness, (RQ3) experiences interacting with the searchbot, and (RQ4) search behaviors. Our results suggest opportunities and challenges in designing searchbots to support collaborative search. On one hand, access to the searchbot resulted in more collaborative awareness, ease of coordination, and fewer duplicated searches. On the other hand, forcing participants to share the querying environment resulted in fewer overall queries, fewer query refinements by individuals, and greater levels of effort. We discuss the implications of our findings for designing effective searchbots to support collaborative search. View details
    Preference elicitation as an optimization problem
    Anna Sepliarskaia
    Julia Kiseleva
    Maarten de Rijke
    Proceedings of the International Conference on Recommender Systems (RecSys), ACM (2018), pp. 172-180
    Preview abstract The new user coldstart problem arises when a recommender system does not yet have any information about a user. A common solution to it is to generate a profile by asking the user to rate a number of items. Which items are selected determines the quality of the recommendations made, and thus has been studied extensively. We propose a new elicitation method to generate a static preference questionnaire (SPQ) that poses relative preference questions to the user. Using a latent factor model, we show that SPQ improves personalized recommendations by choosing a minimal and diverse set of questions. We are the first to rigorously prove which optimization task should be solved to select each question in static questionnaires. Our theoretical results are confirmed by extensive experimentation. We test the performance of SPQ on two real-world datasets, under two experimental conditions: simulated, when users behave according to a latent factor model (LFM), and real, in which only real user judgments are revealed as the system asks questions. We show that SPQ reduces the necessary length of a questionnaire by up to a factor of three compared to state-of-the-art preference elicitation methods. Moreover, solving the right optimization task, SPQ also performs better than baselines with dynamically generated questions. View details
    A Theoretical Framework for Conversational Search
    Nick Craswell
    Proceedings of the ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR), ACM (2017), pp. 117-126
    Towards Conversational Recommender Systems
    Katja Hofmann
    Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016), pp. 815-824
    Where Can I Buy a Boulder?: Searching for Offline Retail Locations
    Sandro Bauer
    Ryen W. White
    Proceedings of the 25th International Conference on World Wide Web (2016), pp. 1225-1235