Synthetic Dialogue Generation for Interactive Conversational Elicitation & Recommendation (ICER)
Abstract
Large language models (LLM), despite their success in conducting natural conversations and solving various challenging NLP tasks, may not aim to understand and solicit a user’s preferences and suggest recommendations using the learned user preferences through the interactions. One primary challenge for pursuing this task is the lack of rich conversational recommendation data sets (of user/agent dialog conversations) that contain diverse preference elicitation scenarios on top of item recommendations. While several standard conversational recommender datasets do contain dialogs that the agents query users for preferences, they are often restricted to the use of particular key-phrases that limit users’ preference expressions w.r.t. particular item features (e.g., location, price, rating, etc). On the other hand, hiring human raters to create a personalized conversational recommender dataset that consists of rich preference queries (with the use of various abstract preference-describing attributes) and recommendations is a very expensive, error-prone, and time-consuming process because it requires data collection on numerous personalized recommendation scenarios, where each of them may involve exponential possibilities of preference elicitation interactions (with many different queries). We propose a synthetic data generation methodology that utilizes a synthetic recommendation dialog simulator that generates templatized dialogs and uses Gemini Ultra LLM to in-paint the templatized dialogs so that the dialogs become linguistically more natural. The simulator generates recommendation dialogs that align to the user preferences that are represented by embeddings.