Craig Boutilier

Craig Boutilier

Craig Boutilier is Principal Scientist at Google. He works on various aspects of decision making under uncertainty, with a current focus on sequential decision models: reinforcement learning, Markov decision processes, temporal models, etc.

Positions and Appointments:
He was a Professor in the Department of Computer Science at the University of Toronto (on leave) and Canada Research Chair in Adaptive Decision Making for Intelligent Systems. He received his Ph.D. in Computer Science from the University of Toronto in 1992, and worked as an Assistant and Associate Professor at the University of British Columbia from 1991 until his return to Toronto in 1999. He served as Chair of the Department of Computer Science at Toronto from 2004-2010. He was co-founder (with Tyler Lu) of Granata Decision Systems from 2012-2015, until his move to Google in 2015.

Boutilier was a consulting professor at Stanford University from 1998-2000, an adjunct professor at the University of British Columbia from 1999-2010, and a visiting professor at Brown University in 1998, at the University of Toronto in 1997-98, at Carnegie Mellon University in 2008-09, and at Université Paris-Dauphine (Paris IX) in the spring of 2011. He served on the Technical Advisory Board of CombineNet, Inc. from 2001 to 2010.

Boutilier's current research efforts focus on various aspects of decision making under uncertainty, including the use of generative models and LLMs, in areas such as: recommender systems, preference modeling and elicitation, mechanism design, game theory and multiagent decision processes, economic models, social choice, computational advertising, Markov decision processes, reinforcement learning and probabilistic inference. His research interests have spanned a wide range of topics, from knowledge representation, belief revision, default reasoning, and philosophical logic, to probabilistic reasoning, decision making under uncertainty, multiagent systems, and machine learning.

Research & Academic Service:
Boutilier is a past Editor-in-Chief of the Journal of Artificial Intelligence Research (JAIR). He was a past Associate Editor with the ACM Transactions on Economics and Computation (TEAC), the Journal of Artificial Intelligence Research (JAIR), the Journal of Machine Learning Research (JMLR), and Autonomous Agents and Multiagent Systems (AAMAS); and he has sat on the editorial/advisory boards of several other journals. Boutilier has organized several international conferences and workshops, including his work as Program Chair of the Twenty-first International Joint Conference on Artificial Intelligence (IJCAI-09) and Program Chair of the Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI-2000). He has also served on the conference program committees of roughly 75 leading international conferences.

Awards and Honors:
Boutilier is a Fellow of the Royal Society of Canada (RSC), the Association for Computing Machinery (ACM) and the Association for the Advancement of Artificial Intelligence (AAAI). He was the recipient of the 2018 ACM/SIGAI Autonomous Agents Research Award, He was awarded a Tier I Canada Research Chair, an Isaac Walton Killam Research Fellowship, and an IBM Faculty Award. He received the Killam Teaching Award from the University of British Columbia in 1997. He has also received a number of Best Paper awards including:

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Embeddings have become a pivotal means to represent complex, multi-faceted information about entities, concepts, and relationships in a condensed and useful format. Nevertheless, they often preclude direct interpretation. While downstream tasks make use of these compressed representations, meaningful interpretation usually requires visualization using dimensionality reduction or specialized machine learning interpretability methods. This paper addresses the challenge of making such embeddings more interpretable and broadly useful, by employing large language models (LLMs) to directly interact with embeddings -- transforming abstract vectors into understandable narratives. By injecting embeddings into LLMs, we enable querying and exploration of complex embedding data. We demonstrate our approach on a variety of diverse tasks, including: enhancing concept activation vectors (CAVs), communicating novel embedded entities, and decoding user preferences in recommender systems. Our work couples the immense information potential of embeddings with the interpretative power of LLMs. View details
    Modeling Recommender Ecosystems: Research Challenges at the Intersection of Mechanism Design, Reinforcement Learning and Generative Models
    Martin Mladenov
    Proceedings of the 38th Annual AAAI Conference on Artificial Intelligence (AAAI-24), Vancouver (2024) (to appear)
    Preview abstract Modern recommender systems lie at the heart of complex ecosystems that couple the behavior of users, content providers, advertisers, and other actors. Despite this, the focus of the majority of recommender research---and most practical recommenders of any import---is on the \emph{local, myopic} optimization of the recommendations made to individual users. This comes at a significant cost to the \emph{long-term utility} that recommenders could generate for its users. We argue that explicitly modeling the incentives and behaviors of all actors in the system---and the interactions among them induced by the recommender's policy---is strictly necessary if one is to maximize the value the system brings to these actors and improve overall ecosystem ``health.'' Doing so requires: optimization over long horizons using techniques such as \emph{reinforcement learning}; making inevitable tradeoffs among the utility that can be generated for different actors using the methods of \emph{social choice}; reducing information asymmetry, while accounting for incentives and strategic behavior, using the tools of \emph{mechanism design}; better modeling of both user and item-provider behaviors by incorporating notions from \emph{behavioral economics and psychology}; and exploiting recent advances in \emph{generative and foundation models} to make these mechanisms interpretable and actionable. We propose a conceptual framework that encompasses these elements, and articulate a number of research challenges that emerge at the intersection of these different disciplines. View details
    Model-Free Preference Elicitation
    Carlos Martin
    Tuomas Sandholm
    Proceedings of the 33rd International Joint Conference on Artificial Intelligence (IJCAI-24), Jeju, South Korea (2024) (to appear)
    Preview abstract Elicitation of user preferences is becoming an important approach for improving the qualityof recommendations, especially when there is little or no user history. In this setting, arecommender system interacts with the user by iteratively presenting elicitation questionsand recording their responses. Various criteria have been proposed for optimizing thesequence of queries in order to improve user understanding and thereby the quality ofdownstream recommendations. A compelling approach for preference elicitation is theExpected Value of Information (EVOI), a Bayesian approach which computes the expectedgain in user utility for possible queries. Previous work on EVOI has focused on probabilisticmodels of users for computing posterior utilities. In contrast, in this work we exploremodel-free variants of EVOI which rely on function approximations in order to avoid strongmodeling assumptions. Specifically, we propose to learn a user response model and a userutility model from data which is often available in real-world systems, and to use thesemodels in EVOI in place of the probabilistic models. We show that our approach leads toimproved elicitation performance. View details
    Discovering Personalized Semantics for Soft Attributes in Recommender Systems using Concept Activation Vectors
    Christina Göpfert
    Alex Haig
    Ivan Vendrov
    Tyler Lu
    Hubert Pham
    Mohammad Ghavamzadeh
    ACM Transactions on Recommender Systems (2024)
    Preview abstract Interactive recommender systems have emerged as a promising paradigm to overcome the limitations of the primitive user feedback used by traditional recommender systems (e.g., clicks, item consumption, ratings). They allow users to express intent, preferences, constraints, and contexts in a richer fashion, often using natural language (including faceted search and dialogue). Yet more research is needed to find the most effective ways to use this feedback. One challenge is inferring a user's semantic intent from the open-ended terms or attributes often used to describe a desired item, and using it to refine recommendation results. Leveraging concept activation vectors (CAVs) (Kim, et al., 2018) a recently developed approach for model interpretability in machine learning, we develop a framework to learn a representation that captures the semantics of such attributes and connects them to user preferences and behaviors in recommender systems. One novel feature of our approach is its ability to distinguish objective and subjective attributes (both subjectivity of degree and of sense), and associate different senses of subjective attributes with different users. We demonstrate on both synthetic and real-world data sets that our CAV representation not only accurately interprets users' subjective semantics, but can also be used to improve recommendations through interactive item critiquing. View details
    Minimizing Live Experiments in Recommender Systems: User Simulation to Evaluate Preference Elicitation Policies
    Martin Mladenov
    James Pine
    Hubert Pham
    Shane Li
    Xujian Liang
    Anton Polishko
    Ben Scheetz
    Proceedings of he 47th International ACM/SIGIR Conference on Research and Development in Information Retrieval (SIGIR-24), Washington, DC (2024) (to appear)
    Preview abstract Evaluation of policies in recommender systems (RSs) typically involves A/B testing using live experiments on real users to assess a new policy's impact on relevant metrics. This ``gold standard'' comes at a high cost, however, in terms of cycle time, user cost, and potential user retention. In developing policies for onboarding new users, these costs can be especially problematic, since on-boarding occurs only once. In this work, we describe a simulation methodology used to augment (and reduce) the use of live experiments. We illustrate its deployment for the evaluation of preference elicitation algorithms used to onboard new users of the YouTube Music platform. By developing counterfactually robust user behavior models, and a simulation service that couples such models with production infrastructure, we are able to test new algorithms in a way that reliably predicts their performance on key metrics when deployed live, sometimes more reliably than live experiments due to the scale at which simulation can be realized. We describe our domain, our simulation models and platform, results of experiments and deployment, and suggest future steps needed to further realistic simulation as a powerful complement to live experiments. View details
    Building Human Values into Recommender Systems: An Interdisciplinary Synthesis and Open Problems
    Jonathan Stray
    Alon Halevy
    Parisa Assar
    Dylan Hadfield-menell
    Amar Ashar
    Chloe Bakalar
    Lex Beattie
    Michael Ekstrand
    Claire Leibowicz
    Connie Moon Sehat
    Sara Johansen
    Lianne Kerlin
    David Vickrey
    Spandana Singh
    Sanne Vrijenhoek
    Amy Zhang
    Mckane Andrus
    Natali Helberger
    Polina Proutskova
    Tanushree Mitra
    Nina Vasan
    ACM Transactions on Recommender Systems (2023)
    Preview abstract Recommender systems are the algorithms which select, filter, and personalize content across many of the world’s largest platforms and apps. As such, their positive and negative effects on individuals and on societies have been extensively theorized and studied. The overarching question that arises is whether recommender systems align with the values of the individuals and societies that they serve. Addressing this question in a principled fashion requires technical knowledge of recommender design and their practice, but critically depends on insights from diverse fields including social science, ethics, economics, psychology, policy and law. This paper is a multidisciplinary effort to define a common language for addressing questions around human-value alignment for recommender systems, to synthesize the state of practice and insights from different perspectives, and to propose open problems. We propose a set of values that seem most relevant to recommender systems operating across different domains. We look at values from three different perspectives: 1) measurement, which is a key element of operationalizing values, 2) design, reflecting current approaches and open challenges to implementing these values, and 3) policy, the regulatory approaches which could provide appropriate incentives and standards for recommender system operators. View details
    Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management
    Dhawal Gupta
    Mohammad Ghavamzadeh
    Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS-23), New Orleans (2023)
    Preview abstract Reinforcement learning (RL) has offered great promise for developing dialogue management (DM) agents that avoid being short-sighted, conduct rich conversations, and maximize overall user satisfaction. Despite recent developments in deep RL and language models (LMs), using RL to power conversational chatbots remain a formidable challenge. This is because deep RL algorithms require online exploration to learn effectively, but collecting fresh human-bot interactions can be expensive and unsafe. This issue is exacerbated by the combinatorial action space that these algorithms need to handle, as most LM agents generate responses at the word-level. Leveraging the recent advances of Mixture-of-Expert Language Models (MoE-LMs) that capture diverse semantics, generate utterances of different intents, and are amenable for multi-turn DM, we develop a gamut of offline RL algorithms that excel in dialogue planning. Through exploiting the MoE-LM structure, our methods significantly reduce the action space and improve the efficacy of RL DM. We compare that with SOTA methods on open-domain dialogues to demonstrate their effectiveness both in the diversity of generated utterances and the overall DM performance. View details
    A Mixture-of-Expert Approach to RL-based Dialogue Management
    Ofir Nachum
    Dhawal Gupta
    Moonkyung Ryu
    Mohammad Ghavamzadeh
    Proceedings of the Eleventh International Conference on Learning Representations (ICLR-23), Kigali, Rwanda (2023)
    Preview abstract Despite recent advancements in language models (LMs), their application to dialogue management (DM) and ability to carry on rich conversations remains a challenge. We use reinforcement learning (RL) to develop a dialogue agent that avoids being short-sighted (often outputting generic utterances) and maximizes overall user satisfaction. However, existing RL approaches focus on training an agent that operates at the word level. Since generating semantically-correct and sensible utterances from a large vocabulary space is combinatorially complex, RL can struggle to produce engaging dialogue, even if warm-started with a pre-trained LM. To address this issue, we develop a RL-based DM using a novel mixture-of-expert (MoE) approach, which consists of (i) a language representation that captures diverse information, (ii) several modulated LMs (or experts) to generate candidate utterances, and (iii) a RL-based DM that performs dialogue planning with the utterances generated by the experts. This MoE approach provides greater flexibility to generate sensible utterances of different intents, and allows RL to focus on conversational-level DM. We compare it with SOTA baselines on open-domain dialogues and demonstrate its effectiveness both in the diversity and sensibility of the generated utterances as well as the overall DM performance. View details
    Reinforcement Learning with History Dependent Dynamic Contexts
    Nadav Merlis
    Martin Mladenov
    Proceedings of the 40th International Conference on Machine Learning (ICML 2023), Honolulu, Hawaii
    Preview abstract We introduce a framework for modeling and solving reinforcement learning problems in non-Markovian, history-dependent environments. Our framework, called the Dynamic Contextual Markov Decision Process (DCMDP), generalizes the contextual MDP framework to handle non-Markov environments where contexts change over time. To overcome the exponential dependence on history, we leverage an aggregated mapping of previous visits to states, actions and contexts to construct an optimistic upper confidence-based algorithm, for which we establish regret bounds. Motivated by our theoretical results, we introduce a practical model-based algorithm that addresses history-dependent contexts, by planing in a latent space and using optimism over history-dependent features. We demonstrate the efficiency and performance of our approach on a recommendation task using the MovieLens dataset, in which the user's behavior is influenced by the agent's recommendations and changes over time. View details
    Fine-tuning Text-to-Image Diffusion Models via Reinforcement Learning from Human Feedback
    Ying Fan
    Olivia Watkins
    Yuqing Du
    Hao Liu
    Moonkyung Ryu
    Pieter Abbeel
    Mohammad Ghavamzadeh
    Kangwook Lee
    Kimin Lee
    Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS-23), New Orleans (2023)
    Preview abstract Despite significant progress in text-to-image synthesis, current models often produce images that do not align well with text prompts. To overcome this challenge, recent works have collected a large dataset of human feedback and trained a reward function that aligns with human evaluations. However, optimizing text-to-image models to maximize this reward function remains a challenging problem. In this work, we investigate reinforcement learning (RL) to fine-tune text-to-image models. Specifically, we define the fine-tuning task as an RL problem, tailored for diffusion models. We then update the pre-trained text-to-image diffusion models using a policy gradient algorithm to maximize the scores of the reward model, based on human feedback. We investigate several design choices, such as KL regularization, value learning, and balancing regularization coefficients, and find that careful consideration of these design choices is crucial for effective RL fine-tuning. Our experiments demonstrate that RL fine-tuning is more effective in improving pre-trained models than supervised fine-tuning, in terms of both alignment and fidelity. View details