Jump to Content
Krisztian Balog

Krisztian Balog

Krisztian Balog is a staff research scientist at Google. His research interests include conversational information access and evaluation. More of his publications are listed on Google Scholar.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, desc
  • Year
  • Year, desc
    Measuring the Impact of Explanation Bias: A Study of Natural Language Justifications for Recommender Systems
    Andrey Petrov
    Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems (CHI EA '23), ACM
    Preview abstract Despite the potential impact of explanations on decision making, there is a lack of research on quantifying the effect of explanations on users' choices. This paper presents an experimental protocol for measuring the degree to which positively or negatively biased explanations can lead to users choosing suboptimal recommendations. Key elements of this protocol include a preference elicitation stage to allow for personalizing recommendations, manual identification and extraction of item aspects from reviews, and a controlled method for introducing bias through the combination of both positive and negative aspects. We also present explanations in two different textual formats: as a list of item aspects and as fluent natural language text. Through a user study with 129 participants, we demonstrate that explanations can significantly affect users' selections and that these findings generalize across explanation formats. View details
    Preview abstract Traditional recommender systems leverage users' item preference history to recommend novel content that users may like. However, dialog interfaces that allow users to express language-based preferences offer a fundamentally different modality for preference input. Inspired by recent successes of prompting paradigms for large language models (LLMs), we study their use for making recommendations from both item-based and language-based preferences in comparison to state-of-the-art item-based collaborative filtering (CF) methods. To support this investigation, we collect a new dataset consisting of both item-based and language-based preferences elicited from users along with their ratings on a variety of (biased) recommended items and (unbiased) random items. Among numerous experimental results, we find that LLMs provide competitive recommendation performance for pure language-based preferences (no item preferences) in the near cold-start case in comparison to item-based CF methods, despite having no supervised training for this specific task (zero-shot) or only a few labels (few-shot). This is particularly promising as language-based preference representations are more explainable and scrutable than item-based or vector-based representations. View details
    Beyond Single Items: Exploring User Preferences in Item Sets with the Conversational Playlist Curation Dataset
    Arun Chaganty
    Megan Leszczynski
    Shu Zhang
    Ravi Ganti
    Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (2023)
    Preview abstract Users in consumption domains, like music, are often able to more efficiently provide preferences over a set of items (e.g. a playlist or radio) than over single items (e.g. songs). Unfortunately, this is an underexplored area of research, with most existing recommendation systems limited to understanding preferences over single items. Curating an item set exponentiates the search space that recommender systems must consider (all subsets of items!): this motivates conversational approaches---where users explicitly state or refine their preferences and systems elicit preferences in natural language---as an efficient way to understand user needs. We call this task conversational item set curation and present a novel data collection methodology that efficiently collects realistic preferences about item sets in a conversational setting by observing both item-level and set-level feedback. We apply this methodology to music recommendation to build the Conversational Playlist Curation Dataset (CPCD), where we show that it leads raters to express preferences that would not be otherwise expressed. Finally, we propose a wide range of conversational retrieval models as baselines for this task and evaluate them on the dataset. View details
    Conversational Music Retrieval with Synthetic Data
    Megan Eileen Leszczynski
    Ravi Ganti
    Shu Zhang
    Arun Tejasvi Chaganty
    Second Workshop on Interactive Learning for Natural Language Processing at NeurIPS 2022
    Preview abstract Users looking for recommendations often wish to improve suggestions through broad natural language feedback (e.g., “How about something more upbeat?”). However, building such conversational retrieval systems requires conversational data with rich user utterances paired with slates of items that cover a diverse range of preferences. This is challenging to collect scalably using conventional methods like crowd-sourcing. We address this problem with a new technique to synthesize high-quality dialog data by transforming the domain expertise encoded in curated item collections into corresponding item-seeking conversations. The method first generates a sequence of hypothetical slates returned by a system, and then uses a language model to introduce corresponding user utterances. We apply the approach on a dataset of curated music playlists to generate 10k diverse music-seeking conversations. A qualitative human evaluation shows that a majority of these conversations express believable sequences of slates and include user utterances that faithfully express preferences for them. When used to train a conversational retrieval model, the synthetic data yields up to a 23% relative gain on standard retrieval metrics compared to baselines trained on non-conversational and conversational datasets. View details
    On Natural Language User Profiles for Transparent and Scrutable Recommendation
    Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22) (2022)
    Preview abstract Natural interaction with recommendation and personalized search systems has received tremendous attention in recent years. We focus on the challenge of supporting people's understanding and control of these systems and explore a fundamentally new way of thinking about representation of knowledge in recommendation and personalization systems. Specifically, we argue that it may be both desirable and possible for algorithms that use natural language representations of users' preferences to be developed. We make the case that this could provide significantly greater transparency, as well as affordances for practical actionable interrogation of, and control over, recommendations. Moreover, we argue that such an approach, if successfully applied, may enable a major step towards systems that rely less on noisy implicit observations while increasing portability of knowledge of one's interests. View details
    On Interpretation and Measurement of Soft Attributes for Recommendation
    Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '21) (2021)
    Preview abstract We address how to robustly interpret natural language refinements (or critiques) in recommender systems. In particular, in human-human recommendations settings people frequently use soft attributes to express preferences about items, including concepts like the originality of a movie plot, the noisiness of a venue, or the complexity of a recipe. While binary tagging is extensively studied in the context of recommender systems, soft attributes often involve subjective and contextual aspects, which cannot be captured reliably in this way, nor be represented as objective binary truth in a knowledge base. This also adds important considerations when measuring soft attribute ranking. We propose a more natural representation as personalized relative statements, rather than as absolute item properties. We present novel data collection techniques and evaluation approaches, and a new public dataset. We also propose a set of scoring approaches, from unsupervised to weakly supervised to fully supervised, as a step towards interpreting and acting upon soft attribute based critiques. View details
    Measuring Recommendation Explanation Quality: The Conflicting Goals of Explanations
    Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '20) (2020)
    Preview abstract Explanations have a large effect on how people respond to recommendations.  However, there are many possible intentions a system may have in generating explanations for a given recommendation - from increasing transparency, to enabling a faster decision, to persuading the recipient.  As a good explanation for one goal may not be good for others, we address the questions of (1) how to robustly measure if an explanation meets a given goal and (2) how the different goals interact with each other.  Specifically, this paper presents a first proposal of how to measure the quality of explanations along seven common goal dimensions catalogued in the literature.  We find that the seven goals are not independent, but rather exhibit strong structure.  Proposing two novel explanation evaluation designs, we identify challenges in evaluation, and provide more efficient measurement approaches of explanation quality. View details
    Transparent, Scrutable and Explainable User Models for Personalized Recommendation
    Shushan Arakelyan
    Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '19) (2019)
    Preview abstract Most recommender systems base their recommendations on implicit or explicit item-level feedback provided by users. These item ratings are combined into a complex user model, which then predicts the suitability of other items. While effective, such methods have limited scrutability and transparency. For instance, if a user's interests change, then many item ratings would usually need to be modified to significantly shift the user's recommendations. Similarly, explaining how the system characterizes the user is impossible, short of presenting the entire list of known item ratings. In this paper, we present a new set-based recommendation technique that permits the user model to be explicitly presented to users in natural language, empowering users to understand recommendations made and improve the recommendations dynamically. While performing comparably to traditional collaborative filtering techniques in a standard static setting, our approach allows users to efficiently improve recommendations. Further, it makes it easier for the model to be validated and adjusted, building user trust and understanding. View details
    Personal Knowledge Graphs: A Research Agenda
    Proceedings of the ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR), ACM (2019)
    Preview abstract Knowledge graphs, organizing structured information about entities, and their attributes and relationships, are ubiquitous today. Entities, in this context, are usually taken to be anyone or anything considered to be globally important. This, however, rules out many entities people interact with on a daily basis. In this position paper, we present the concept of personal knowledge graphs: resources of structured information about entities personally related to its user, including the ones that might not be globally important. We discuss key aspects that separate them for general knowledge graphs, identify the main challenges involved in constructing and using them, and define a research agenda. View details
    Preview abstract Conversational recommendation has recently attracted significant attention. As systems must understand users' preferences, training them has called for conversational corpora, typically derived from task-oriented conversations. We observe that such corpora often do not reflect how people naturally describe preferences. We present a new approach to obtaining user preferences in dialogue: Coached Conversational Preference Elicitation. It allows collection of natural yet structured conversational preferences. Studying the dialogues in one domain, we present a brief quantitative analysis of how people describe movie preferences at scale. Demonstrating the methodology, we release the CCPE-M dataset to the community with over 500 movie preference dialogues expressing over 10,000 preferences. View details
    No Results Found