Krisztian Balog
Krisztian Balog is a staff research scientist at Google. His research interests include conversational information access and evaluation. More of his publications are listed on Google Scholar.
Authored Publications
Sort By
Towards Realistic Synthetic User-Generated Content: A Scaffolding Approach to Generating Online Discussions
Barbara Ikica
Hamidreza Alvari
Mehdi Hafezi Manshadi
(2024)
Preview abstract
The emergence of synthetic data represents a pivotal shift in modern machine learning, offering a solution to satisfy the need for large volumes of data in domains where real data is scarce, highly private, or difficult to obtain. We investigate the feasibility of creating realistic, large-scale synthetic datasets of user-generated content, noting that such content is increasingly prevalent and a source of frequently sought information. Large language models (LLMs) offer a starting point for generating synthetic social media discussion threads, due to their ability to produce diverse responses that typify online interactions. However, as we demonstrate, straightforward application of LLMs yields limited success in capturing the complex structure of online discussions, and standard prompting mechanisms lack sufficient control. We therefore propose a multi-step generation process, predicated on the idea of creating compact representations of discussion threads, referred to as scaffolds. Our framework is generic yet adaptable to the unique characteristics of specific social media platforms. We demonstrate its feasibility using data from two distinct online discussion platforms. To address the fundamental challenge of ensuring the representativeness and realism of synthetic data, we propose a portfolio of evaluation measures to compare various instantiations of our framework.
View details
Large Language Models are Competitive Near Cold-start Recommenders for Language- and Item-based Preferences
Scott Sanner
Proceedings of ACM Conference on Recommender Systems (RecSys ’23) (2023) (to appear)
Preview abstract
Traditional recommender systems leverage users' item preference history to recommend novel content that users may like. However, dialog interfaces that allow users to express language-based preferences offer a fundamentally different modality for preference input. Inspired by recent successes of prompting paradigms for large language models (LLMs), we study their use for making recommendations from both item-based and language-based preferences in comparison to state-of-the-art item-based collaborative filtering (CF) methods. To support this investigation, we collect a new dataset consisting of both item-based and language-based preferences elicited from users along with their ratings on a variety of (biased) recommended items and (unbiased) random items. Among numerous experimental results, we find that LLMs provide competitive recommendation performance for pure language-based preferences (no item preferences) in the near cold-start case in comparison to item-based CF methods, despite having no supervised training for this specific task (zero-shot) or only a few labels (few-shot). This is particularly promising as language-based preference representations are more explainable and scrutable than item-based or vector-based representations.
View details
Beyond Single Items: Exploring User Preferences in Item Sets with the Conversational Playlist Curation Dataset
Arun Chaganty
Megan Leszczynski
Shu Zhang
Ravi Ganti
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (2023)
Preview abstract
Users in consumption domains, like music, are often able to more efficiently provide preferences over a set of items (e.g. a playlist or radio) than over single items (e.g. songs). Unfortunately, this is an underexplored area of research, with most existing recommendation systems limited to understanding preferences over single items. Curating an item set exponentiates the search space that recommender systems must consider (all subsets of items!): this motivates conversational approaches---where users explicitly state or refine their preferences and systems elicit preferences in natural language---as an efficient way to understand user needs. We call this task conversational item set curation and present a novel data collection methodology that efficiently collects realistic preferences about item sets in a conversational setting by observing both item-level and set-level feedback. We apply this methodology to music recommendation to build the Conversational Playlist Curation Dataset (CPCD), where we
show that it leads raters to express preferences that would not be otherwise expressed. Finally, we propose a wide range of conversational retrieval models as baselines for this task and evaluate them on the dataset.
View details
Measuring the Impact of Explanation Bias: A Study of Natural Language Justifications for Recommender Systems
Andrey Petrov
Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems (CHI EA '23), ACM
Preview abstract
Despite the potential impact of explanations on decision making, there is a lack of research on quantifying the effect of explanations on users' choices. This paper presents an experimental protocol for measuring the degree to which positively or negatively biased explanations can lead to users choosing suboptimal recommendations. Key elements of this protocol include a preference elicitation stage to allow for personalizing recommendations, manual identification and extraction of item aspects from reviews, and a controlled method for introducing bias through the combination of both positive and negative aspects. We also present explanations in two different textual formats: as a list of item aspects and as fluent natural language text. Through a user study with 129 participants, we demonstrate that explanations can significantly affect users' selections and that these findings generalize across explanation formats.
View details
Conversational Music Retrieval with Synthetic Data
Megan Eileen Leszczynski
Ravi Ganti
Shu Zhang
Arun Tejasvi Chaganty
Second Workshop on Interactive Learning for Natural Language Processing at NeurIPS 2022
Preview abstract
Users looking for recommendations often wish to improve suggestions through
broad natural language feedback (e.g., “How about something more upbeat?”).
However, building such conversational retrieval systems requires conversational
data with rich user utterances paired with slates of items that cover a diverse
range of preferences. This is challenging to collect scalably using conventional
methods like crowd-sourcing. We address this problem with a new technique to
synthesize high-quality dialog data by transforming the domain expertise encoded
in curated item collections into corresponding item-seeking conversations. The
method first generates a sequence of hypothetical slates returned by a system,
and then uses a language model to introduce corresponding user utterances. We
apply the approach on a dataset of curated music playlists to generate 10k diverse
music-seeking conversations. A qualitative human evaluation shows that a majority
of these conversations express believable sequences of slates and include user
utterances that faithfully express preferences for them. When used to train a
conversational retrieval model, the synthetic data yields up to a 23% relative gain
on standard retrieval metrics compared to baselines trained on non-conversational
and conversational datasets.
View details
On Natural Language User Profiles for Transparent and Scrutable Recommendation
Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22) (2022)
Preview abstract
Natural interaction with recommendation and personalized search systems has received tremendous attention in recent years. We focus on the challenge of supporting people's understanding and control of these systems and explore a fundamentally new way of thinking about representation of knowledge in recommendation and personalization systems. Specifically, we argue that it may be both desirable and possible for algorithms that use natural language representations of users' preferences to be developed.
We make the case that this could provide significantly greater transparency, as well as affordances for practical actionable interrogation of, and control over, recommendations. Moreover, we argue that such an approach, if successfully applied, may enable a major step towards systems that rely less on noisy implicit observations while increasing portability of knowledge of one's interests.
View details
On Interpretation and Measurement of Soft Attributes for Recommendation
Alexandros Karatzoglou
Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '21) (2021)
Preview abstract
We address how to robustly interpret natural language refinements (or critiques) in recommender systems. In particular, in human-human recommendations settings people frequently use soft attributes to express preferences about items, including concepts like the originality of a movie plot, the noisiness of a venue, or the complexity of a recipe. While binary tagging is extensively studied in the context of recommender systems, soft attributes often involve subjective and contextual aspects, which cannot be captured reliably in this way, nor be represented as objective binary truth in a knowledge base. This also adds important considerations when measuring soft attribute ranking. We propose a more natural representation as personalized relative statements, rather than as absolute item properties. We present novel data collection techniques and evaluation approaches, and a new public dataset. We also propose a set of scoring approaches, from unsupervised to weakly supervised to fully supervised, as a step towards interpreting and acting upon soft attribute based critiques.
View details
Measuring Recommendation Explanation Quality: The Conflicting Goals of Explanations
Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '20) (2020)
Preview abstract
Explanations have a large effect on how people respond to recommendations. However, there are many possible intentions a system may have in generating explanations for a given recommendation - from increasing transparency, to enabling a faster decision, to persuading the recipient. As a good explanation for one goal may not be good for others, we address the questions of (1) how to robustly measure if an explanation meets a given goal and (2) how the different goals interact with each other. Specifically, this paper presents a first proposal of how to measure the quality of explanations along seven common goal dimensions catalogued in the literature. We find that the seven goals are not independent, but rather exhibit strong structure. Proposing two novel explanation evaluation designs, we identify challenges in evaluation, and provide more efficient measurement approaches of explanation quality.
View details
Transparent, Scrutable and Explainable User Models for Personalized Recommendation
Shushan Arakelyan
Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '19) (2019)
Preview abstract
Most recommender systems base their recommendations on implicit or explicit item-level feedback provided by users. These item ratings are combined into a complex user model, which then predicts the suitability of other items. While effective, such methods have limited scrutability and transparency. For instance, if a user's interests change, then many item ratings would usually need to be modified to significantly shift the user's recommendations. Similarly, explaining how the system characterizes the user is impossible, short of presenting the entire list of known item ratings. In this paper, we present a new set-based recommendation technique that permits the user model to be explicitly presented to users in natural language, empowering users to understand recommendations made and improve the recommendations dynamically. While performing comparably to traditional collaborative filtering techniques in a standard static setting, our approach allows users to efficiently improve recommendations. Further, it makes it easier for the model to be validated and adjusted, building user trust and understanding.
View details
Coached Conversational Preference Elicitation: A Case Study in Understanding Movie Preferences
Proceedings of the Annual SIGdial Meeting on Discourse and Dialogue (2019)
Preview abstract
Conversational recommendation has recently attracted significant attention. As systems must understand users' preferences, training them has called for conversational corpora, typically derived from task-oriented conversations. We observe that such corpora often do not reflect how people naturally describe preferences.
We present a new approach to obtaining user preferences in dialogue: Coached Conversational Preference Elicitation. It allows collection of natural yet structured conversational preferences. Studying the dialogues in one domain, we present a brief quantitative analysis of how people describe movie preferences at scale. Demonstrating the methodology, we release the CCPE-M dataset to the community with over 500 movie preference dialogues expressing over 10,000 preferences.
View details