Guy Tennenholtz
Guy Tennenholtz is a research scientist at Google Research. He received his Ph.D. from the Technion Institute of Technology in 2022. He has published over 20 papers in major machine learning conferences. His research focuses include reinforcement learning and causal inference with applications in ecosystems in recommender systems, healthcare, and robotics.
Research Areas
Authored Publications
Sort By
ConvApparel: A Benchmark Dataset and Validation Framework for User Simulators in Conversational Recommenders
Jihwan Jeong
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (EACL-26), Rabat, Morocco (2026), pp. 5270-5304
Preview abstract
LLM-based user simulators are a scalable solution for improving conversational AI, but a critical realism gap undermines their effectiveness. To close this gap, we introduce a framework for building and validating high-fidelity simulators. We present a novel dataset of human-AI shopping conversations designed to capture a wide spectrum of user experiences. To measure fidelity, we propose a hybrid evaluation protocol that combines statistical alignment with a learned, discriminator-based Human-Likeness Score. Our most sophisticated simulator, trained via reinforcement learning with iterative critique, achieves a significant leap in realism. Critically, we demonstrate through counterfactual validation that our simulator—trained exclusively on optimal interactions—realistically adapts its behavior to suboptimal system responses, mirroring real user reactions and marking a key advance in creating reliable simulators for robust AI development.
View details
ConvApparel: A Benchmark Dataset and Validation Framework for User Simulators in Conversational Recommenders
Jihwan Jeong
The 19th Conference of the European Chapter of the Association for Computational Linguistics (EACL-26), Rabat, Morocco (2026)
Preview abstract
LLM-based user simulators are a scalable solution for improving conversational AI, but a critical realism gap undermines their effectiveness. To close this gap, we introduce a framework for building and validating high-fidelity simulators. We present a novel dataset of human-AI shopping conversations designed to capture a wide spectrum of user experiences. To measure fidelity, we propose a hybrid evaluation protocol that combines statistical alignment with a learned, discriminator-based Human-Likeness Score. Our most sophisticated simulator, trained via reinforcement learning with iterative critique, achieves a significant leap in realism. Critically, we demonstrate through counterfactual validation that our simulator—trained exclusively on optimal interactions—realistically adapts its behavior to suboptimal system responses, mirroring real user reactions and marking a key advance in creating reliable simulators for robust AI development.
View details
Asking Clarifying Questions for Preference Elicitation with Large Language Models
Ali Montazer
1st Workshop on Next Generation of IR and Recommender Systems with Language Agents, Generative Models, and Conversational AI (GENNEXT@SIGIR'25), Padua, IT (2025)
Preview abstract
Large Language Models (LLMs) have made it possible for recommendation systems to interact with users in open-ended conversational interfaces. In order to personalize LLM responses, it is crucial to elicit user preferences, especially when there is limited user history. One way to get more information is to present clarifying questions to the user. However, generating effective sequential clarifying questions across various domains remains a challenge, as even advanced LLMs still struggle with this task.
To address this, we introduce a novel approach for training LLMs to ask sequential questions that reveal user preferences. Our method follows a two-stage process inspired by diffusion models: starting from a user profile, in a forward process we generate clarifying questions, obtain answers, and then remove the corresponding information from the user profile, which is analogous to adding noise to the user profile. In the reverse process, zour model learns to “denoise” the user profile by learning to ask effective clarifying questions. Our results show that our method significantly boosts the LLM’s proficiency in asking funnel questions and elicit user preferences effectively.
View details
Preference Adaptive and Sequential Text-to-Image Generation
Ofir Nabati
Moonkyung Ryu
Sean Li
42nd International Conference on Machine Learning (ICML-25), Vancouver (2025), pp. 45362-45394
Preview abstract
We consider the problem of sequential text-to-image generation. Specifically, we formulate a personalized interactive framework, where an agent iteratively improves a user's prompt through a series of sequential prompt expansions. We formulate the problem as a sequential decision-making task. Using human raters, we create a dataset of sequential preferences for this problem. We then leverage our sequential data, together with large-scale open-source non-sequential datasets to construct user-preference and user-choice models. Particularly, we employ an EM strategy to develop a personalized sequential user model. We then leverage a multi-modal large language model (MM-LLM) and a value-based reinforcement learning (RL) agent to suggest a personalized and diverse slate of prompt expansions to the user. Our Personalized And Sequential Text-to-image Agent (PASTA) empowers diffusion models with personalized multi-turn capabilities, fostering collaborative co-creation, and addressing uncertainties or under-specifications in user intent. We evaluate our agent using human raters, showing significant improvement compared to baseline methods. We also release our sequential rater dataset and additional simulated data of user-agent interactions to advance future research in personalized multi-turn text-to-image generation.
View details
Preference Adaptive and Sequential Text-to-Image Generation
Ofir Nabati
Moonkyung Ryu
Sean Li
2025
Preview abstract
We address the problem of interactive text-toimage (T2I) generation, designing a reinforcement learning (RL) agent which iteratively improves a set of generated images for a user through a sequence of prompt expansions. Using human raters, we create a novel dataset of sequential preferences, which we leverage, together with largescale open-source (non-sequential) datasets. We construct user-preference and user-choice models using an EM strategy and identify varyinguser preference types. We then leverage a large multimodal language model (LMM) and a valuebased RL approach to suggest an adaptive and diverse slate of prompt expansions to the user. Our Preference Adaptive and Sequential Text-toimage Agent (PASTA) extends T2I models with adaptive multi-turn capabilities, fostering collaborative co-creation and addressing uncertainty or underspecification in a user’s intent. We evaluate PASTA using human raters, showing significant improvement compared to baseline methods. We also open-source our sequential rater dataset and simulated user-rater interactions to support future research in user-centric multi-turn T2I systems.
View details
Preview abstract
Large Language Models (LLMs) have made it possible for recommendation systems to interact with users in open-ended conversational interfaces. In order to personalize LLM responses, it is crucial to elicit user preferences, especially when there is limited user history. One way to get more information is to present clarifying questions to the user. However, generating effective sequential clarifying questions across various domains remains a challenge, as even advanced LLMs still struggle with this task.
To address this, we introduce a novel approach for training LLMs to ask sequential questions that reveal user preferences. Our method follows a two-stage process inspired by diffusion models: starting from a user profile, in a forward process we generate clarifying questions, obtain answers, and then remove the corresponding information from the user profile, which is analogous to adding noise to the user profile. In the reverse process, zour model learns to “denoise” the user profile by learning to ask effective clarifying questions. Our results show that our method significantly boosts the LLM’s proficiency in asking funnel questions and elicit user preferences effectively.
View details
Preference Adaptive and Sequential Text-to-Image Generation
Ofir Nabati
Moonkyung Ryu
Sean Li
2025
Preview abstract
We address the problem of interactive text-to-image (T2I) generation, designing a reinforcement learning (RL) agent which iteratively improves a set of generated images for a user through a sequence of prompt expansions. Using human raters, we create a novel dataset of sequential preferences, which we leverage, together with large-scale open-source (non-sequential) datasets. We construct user-preference and user-choice models using an EM strategy and identify varying user preference types. We then leverage a large multimodal language model (LMM) and a value-based RL approach to suggest an adaptive and diverse slate of prompt expansions to the user. Our Preference Adaptive and Sequential Text-to-image Agent (PASTA) extends T2I models with adaptive multi-turn capabilities, fostering collaborative co-creation and addressing uncertainty or underspecification in a user's intent. We evaluate PASTA using human raters, showing significant improvement compared to baseline methods. We also open-source our sequential rater dataset and simulated user-rater interactions to support future research in user-centric multi-turn T2I systems.
View details
DynaMITE-RL: A Dynamic Model for Improved Temporal Meta Reinforcement Learning
Anthony Liang
Erdem Biyik
Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS-24), Vancouver (2024)
Preview abstract
We introduce a meta-reinforcement learning (meta-RL) approach, called DynaMITE-RL, to perform approximate inference in environments where the latent information evolves slowly between subtrajectories called sessions.
We identify three key modifications to contemporary meta-RL methods: consistency of latent information during sessions, session masking, and prior latent conditioning.
We demonstrate the necessity of these modifications on various downstream applications from discrete Gridworld environments to continuous control and simulated robot assistive tasks and find that our approach significantly outperforms contemporary baselines.
View details
Demystifying Embedding Spaces using Large Language Models
Jihwan Jeong
Lior Shani
Aza Tulepbergenov
Martin Mladenov
The Twelfth International Conference on Learning Representations (2024)
Preview abstract
Embeddings have become a pivotal means to represent complex, multi-faceted information about entities, concepts, and relationships in a condensed and useful format. Nevertheless, they often preclude direct interpretation. While downstream tasks make use of these compressed representations, meaningful interpretation usually requires visualization using dimensionality reduction or specialized machine learning interpretability methods. This paper addresses the challenge of making such embeddings more interpretable and broadly useful, by employing large language models (LLMs) to directly interact with embeddings -- transforming abstract vectors into understandable narratives. By injecting embeddings into LLMs, we enable querying and exploration of complex embedding data. We demonstrate our approach on a variety of diverse tasks, including: enhancing concept activation vectors (CAVs), communicating novel embedded entities, and decoding user preferences in recommender systems. Our work couples the immense information potential of embeddings with the interpretative power of LLMs.
View details
Modeling Recommender Ecosystems: Research Challenges at the Intersection of Mechanism Design, Reinforcement Learning and Generative Models
Martin Mladenov
Proceedings of the 38th Annual AAAI Conference on Artificial Intelligence (AAAI-24), Vancouver (2024) (to appear)
Preview abstract
Modern recommender systems lie at the heart of complex ecosystems that couple the behavior of users, content providers, advertisers, and other actors. Despite this, the focus of the majority of recommender research---and most practical recommenders of any import---is on the \emph{local, myopic} optimization of the recommendations made to individual users. This comes at a significant cost to the \emph{long-term utility} that recommenders could generate for its users. We argue that explicitly modeling the incentives and behaviors of all actors in the system---and the interactions among them induced by the recommender's policy---is strictly necessary if one is to maximize the value the system brings to these actors and improve overall ecosystem ``health.'' Doing so requires: optimization over long horizons using techniques such as \emph{reinforcement learning}; making inevitable tradeoffs among the utility that can be generated for different actors using the methods of \emph{social choice}; reducing information asymmetry, while accounting for incentives and strategic behavior, using the tools of \emph{mechanism design}; better modeling of both user and item-provider behaviors by incorporating notions from \emph{behavioral economics and psychology}; and exploiting recent advances in \emph{generative and foundation models} to make these mechanisms interpretable and actionable. We propose a conceptual framework that encompasses these elements, and articulate a number of research challenges that emerge at the intersection of these different disciplines.
View details