Preference Adaptive and Sequential Text-to-Image Generation

Ofir Nabati

Guy Tennenholtz

Chih-wei Hsu

Moonkyung Ryu

Deepak Ramachandran

Yinlam Chow

Sean Li

Craig Boutilier

42nd International Conference on Machine Learning (ICML-25), Vancouver (2025), pp. 45362-45394

Download Google Scholar

Abstract

We consider the problem of sequential text-to-image generation. Specifically, we formulate a personalized interactive framework, where an agent iteratively improves a user's prompt through a series of sequential prompt expansions. We formulate the problem as a sequential decision-making task. Using human raters, we create a dataset of sequential preferences for this problem. We then leverage our sequential data, together with large-scale open-source non-sequential datasets to construct user-preference and user-choice models. Particularly, we employ an EM strategy to develop a personalized sequential user model. We then leverage a multi-modal large language model (MM-LLM) and a value-based reinforcement learning (RL) agent to suggest a personalized and diverse slate of prompt expansions to the user. Our Personalized And Sequential Text-to-image Agent (PASTA) empowers diffusion models with personalized multi-turn capabilities, fostering collaborative co-creation, and addressing uncertainties or under-specifications in user intent. We evaluate our agent using human raters, showing significant improvement compared to baseline methods. We also release our sequential rater dataset and additional simulated data of user-agent interactions to advance future research in personalized multi-turn text-to-image generation.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Preference Adaptive and Sequential Text-to-Image Generation

Abstract

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs