Craig Boutilier

Craig Boutilier

Craig Boutilier is Principal Scientist at Google. He works on various aspects of decision making under uncertainty, with a current focus on sequential decision models: reinforcement learning, Markov decision processes, temporal models, etc.

Positions and Appointments:
He was a Professor in the Department of Computer Science at the University of Toronto (on leave) and Canada Research Chair in Adaptive Decision Making for Intelligent Systems. He received his Ph.D. in Computer Science from the University of Toronto in 1992, and worked as an Assistant and Associate Professor at the University of British Columbia from 1991 until his return to Toronto in 1999. He served as Chair of the Department of Computer Science at Toronto from 2004-2010. He was co-founder (with Tyler Lu) of Granata Decision Systems from 2012-2015, until his move to Google in 2015.

Boutilier was a consulting professor at Stanford University from 1998-2000, an adjunct professor at the University of British Columbia from 1999-2010, and a visiting professor at Brown University in 1998, at the University of Toronto in 1997-98, at Carnegie Mellon University in 2008-09, and at Université Paris-Dauphine (Paris IX) in the spring of 2011. He served on the Technical Advisory Board of CombineNet, Inc. from 2001 to 2010.

Research:
Boutilier's current research efforts focus on various aspects of decision making under uncertainty, including the use of generative models and LLMs, in areas such as: recommender systems, preference modeling and elicitation, mechanism design, game theory and multiagent decision processes, economic models, social choice, computational advertising, Markov decision processes, reinforcement learning and probabilistic inference. His research interests have spanned a wide range of topics, from knowledge representation, belief revision, default reasoning, and philosophical logic, to probabilistic reasoning, decision making under uncertainty, multiagent systems, and machine learning.

Research & Academic Service:
Boutilier is a past Editor-in-Chief of the Journal of Artificial Intelligence Research (JAIR). He was a past Associate Editor with the ACM Transactions on Economics and Computation (TEAC), the Journal of Artificial Intelligence Research (JAIR), the Journal of Machine Learning Research (JMLR), and Autonomous Agents and Multiagent Systems (AAMAS); and he has sat on the editorial/advisory boards of several other journals. Boutilier has organized several international conferences and workshops, including his work as Program Chair of the Twenty-first International Joint Conference on Artificial Intelligence (IJCAI-09) and Program Chair of the Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI-2000). He has also served on the conference program committees of roughly 75 leading international conferences.

He will serve as Conference Chair of the Thirty-seventh International Joint Conference on Artificial Intelligence (IJCAI-28).

Awards and Honors:
Boutilier is a Fellow of the Royal Society of Canada (RSC), the Association for Computing Machinery (ACM) and the Association for the Advancement of Artificial Intelligence (AAAI). He was the recipient of the 2018 ACM/SIGAI Autonomous Agents Research Award, He was awarded a Tier I Canada Research Chair, an Isaac Walton Killam Research Fellowship, and an IBM Faculty Award. He received the Killam Teaching Award from the University of British Columbia in 1997. He has also received a number of Best Paper awards including:

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Embedding-Aligned Language Models
    Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS-24), Vancouver (2024)
    Preview abstract We propose a novel approach for training large language models (LLMs) to adhere to objectives imposed by a latent embedding space. Our method leverages reinforcement learning (RL), treating a pre-trained LLM as an environment. An Embedding-Aligned Guided LanguagE (EAGLE) agent it trained using a significantly smaller language model to iteratively stir the LLM's generation towards optimal regions of a latent embedding space, given some predefined criteria. We demonstrate the effectiveness of the EAGLE agent using the MovieLens 25M dataset, on extrapolation tasks for content gap to satisfy latent user demand, and multi-attribute satisfaction for generating creative variations of entities. Our work paves the way for controlled and grounded text generation using LLMs, ensuring consistency with domain-specific knowledge and data representations. View details
    DynaMITE-RL: A Dynamic Model for Improved Temporal Meta Reinforcement Learning
    Anthony Liang
    Erdem Biyik
    Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS-24), Vancouver (2024)
    Preview abstract We introduce a meta-reinforcement learning (meta-RL) approach, called DynaMITE-RL, to perform approximate inference in environments where the latent information evolves slowly between subtrajectories called sessions. We identify three key modifications to contemporary meta-RL methods: consistency of latent information during sessions, session masking, and prior latent conditioning. We demonstrate the necessity of these modifications on various downstream applications from discrete Gridworld environments to continuous control and simulated robot assistive tasks and find that our approach significantly outperforms contemporary baselines. View details
    Minimizing Live Experiments in Recommender Systems: User Simulation to Evaluate Preference Elicitation Policies
    Martin Mladenov
    James Pine
    Hubert Pham
    Shane Li
    Xujian Liang
    Anton Polishko
    Li Yang
    Ben Scheetz
    Proceedings of he 47th International ACM/SIGIR Conference on Research and Development in Information Retrieval (SIGIR-24), Washington, DC (2024), pp. 2925-2929
    Preview abstract Evaluation of policies in recommender systems (RSs) typically involves A/B testing using live experiments on real users to assess a new policy's impact on relevant metrics. This ``gold standard'' comes at a high cost, however, in terms of cycle time, user cost, and potential user retention. In developing policies for onboarding new users, these costs can be especially problematic, since on-boarding occurs only once. In this work, we describe a simulation methodology used to augment (and reduce) the use of live experiments. We illustrate its deployment for the evaluation of preference elicitation algorithms used to onboard new users of the YouTube Music platform. By developing counterfactually robust user behavior models, and a simulation service that couples such models with production infrastructure, we are able to test new algorithms in a way that reliably predicts their performance on key metrics when deployed live, sometimes more reliably than live experiments due to the scale at which simulation can be realized. We describe our domain, our simulation models and platform, results of experiments and deployment, and suggest future steps needed to further realistic simulation as a powerful complement to live experiments. View details
    Model-Free Preference Elicitation
    Carlos Martin
    Tuomas Sandholm
    Proceedings of the 33rd International Joint Conference on Artificial Intelligence (IJCAI-24), Jeju, South Korea (2024), pp. 3493-3503
    Preview abstract Elicitation of user preferences is becoming an important approach for improving the qualityof recommendations, especially when there is little or no user history. In this setting, arecommender system interacts with the user by iteratively presenting elicitation questionsand recording their responses. Various criteria have been proposed for optimizing thesequence of queries in order to improve user understanding and thereby the quality ofdownstream recommendations. A compelling approach for preference elicitation is theExpected Value of Information (EVOI), a Bayesian approach which computes the expectedgain in user utility for possible queries. Previous work on EVOI has focused on probabilisticmodels of users for computing posterior utilities. In contrast, in this work we exploremodel-free variants of EVOI which rely on function approximations in order to avoid strongmodeling assumptions. Specifically, we propose to learn a user response model and a userutility model from data which is often available in real-world systems, and to use thesemodels in EVOI in place of the probabilistic models. We show that our approach leads toimproved elicitation performance. View details
    Factual and Personalized Recommendation Language Modeling with Reinforcement Learning
    Jihwan Jeong
    Mohammad Ghavamzadeh
    Proceedings of the First Conference on Language Modeling (COLM-24), Philadelphia (2024)
    Preview abstract Recommender systems (RSs) play a central role in connecting users to products, content and services by matching candidate items to users based on their preferences. While existing RSs often rely on implicit user feedback on recommended items (e.g., clicks, watches, ratings), conversational recommender systems are interacting with users to provide tailored recommendations in natural language. In this work, we aim to develop a recommender language model (LM) that is capable of generating compelling endorsement presentations of relevant items to users, to better explain the details of the items, to connect the items with users’ preferences, and to enhance the likelihood of users accepting recommendations. Specifically, such an LLM-based recommender can understand users’ preferences from users’ RS embeddings summarizing feedback history, output corresponding responses that not only are factually-grounded, but also explain whether these items satisfy users’ preferences in a convincing manner. The pivotal question is how one can gauge the performance of such a LLM recommender. Equipped with a joint reward function that measures factual consistency, convincingness, and personalization, not only can we evaluate the efficacies of different recommender LMs, but we can also utilize this metric as a form of AI feedback to fine-tune our LLM agent via reinforcement learning (RL). Building upon the MovieLens movie recommendation benchmark, we developed a novel conversational recommender delivering personalized movie narratives to users. This work lays the groundwork for recommendation systems that prioritize individualized user experiences without compromising on transparency and integrity. View details
    Preview abstract Personalized recommendation systems are increasingly essential in our information-rich society, aiding users in navigating the expansive online realm. However, accurately modeling the diverse and dynamic interests of the users remains a formidable challenge. Existing user modeling methods, like Single-point User Representation (SUR) and Multi-point User Representation (MUR), have their limitations in terms of accuracy, diversity, computation cost, and adaptability. To overcome these challenges, we introduce a novel model, the Density-based User Representation (DUR), leveraging Gaussian Process Regression (GPR), which has not been extensively explored in multi-interest recommendation and retrieval. Our approach inherently captures user interest dynamics without manual tuning, provides uncertainty-awareness, and is more efficient than point-based representation methods. This paper outlines the development and implementation of GPR4DUR, details its evaluation protocols, and presents extensive analysis demonstrating its effectiveness and efficiency. Experiments on real-world offline datasets confirm our method’s adaptability and efficiency. Further online experiments simulating user behavior illuminate the benefits of our method in the exploration-exploitation trade-off by effectively utilizing model uncertainty. View details
    Modeling Recommender Ecosystems: Research Challenges at the Intersection of Mechanism Design, Reinforcement Learning and Generative Models
    Martin Mladenov
    Proceedings of the 38th Annual AAAI Conference on Artificial Intelligence (AAAI-24), Vancouver (2024) (to appear)
    Preview abstract Modern recommender systems lie at the heart of complex ecosystems that couple the behavior of users, content providers, advertisers, and other actors. Despite this, the focus of the majority of recommender research---and most practical recommenders of any import---is on the \emph{local, myopic} optimization of the recommendations made to individual users. This comes at a significant cost to the \emph{long-term utility} that recommenders could generate for its users. We argue that explicitly modeling the incentives and behaviors of all actors in the system---and the interactions among them induced by the recommender's policy---is strictly necessary if one is to maximize the value the system brings to these actors and improve overall ecosystem ``health.'' Doing so requires: optimization over long horizons using techniques such as \emph{reinforcement learning}; making inevitable tradeoffs among the utility that can be generated for different actors using the methods of \emph{social choice}; reducing information asymmetry, while accounting for incentives and strategic behavior, using the tools of \emph{mechanism design}; better modeling of both user and item-provider behaviors by incorporating notions from \emph{behavioral economics and psychology}; and exploiting recent advances in \emph{generative and foundation models} to make these mechanisms interpretable and actionable. We propose a conceptual framework that encompasses these elements, and articulate a number of research challenges that emerge at the intersection of these different disciplines. View details
    Preview abstract Embeddings have become a pivotal means to represent complex, multi-faceted information about entities, concepts, and relationships in a condensed and useful format. Nevertheless, they often preclude direct interpretation. While downstream tasks make use of these compressed representations, meaningful interpretation usually requires visualization using dimensionality reduction or specialized machine learning interpretability methods. This paper addresses the challenge of making such embeddings more interpretable and broadly useful, by employing large language models (LLMs) to directly interact with embeddings -- transforming abstract vectors into understandable narratives. By injecting embeddings into LLMs, we enable querying and exploration of complex embedding data. We demonstrate our approach on a variety of diverse tasks, including: enhancing concept activation vectors (CAVs), communicating novel embedded entities, and decoding user preferences in recommender systems. Our work couples the immense information potential of embeddings with the interpretative power of LLMs. View details
    Discovering Personalized Semantics for Soft Attributes in Recommender Systems using Concept Activation Vectors
    Christina Göpfert
    Alex Haig
    Ivan Vendrov
    Tyler Lu
    Hubert Pham
    Mohammad Ghavamzadeh
    ACM Transactions on Recommender Systems (2024)
    Preview abstract Interactive recommender systems have emerged as a promising paradigm to overcome the limitations of the primitive user feedback used by traditional recommender systems (e.g., clicks, item consumption, ratings). They allow users to express intent, preferences, constraints, and contexts in a richer fashion, often using natural language (including faceted search and dialogue). Yet more research is needed to find the most effective ways to use this feedback. One challenge is inferring a user's semantic intent from the open-ended terms or attributes often used to describe a desired item, and using it to refine recommendation results. Leveraging concept activation vectors (CAVs) (Kim, et al., 2018) a recently developed approach for model interpretability in machine learning, we develop a framework to learn a representation that captures the semantics of such attributes and connects them to user preferences and behaviors in recommender systems. One novel feature of our approach is its ability to distinguish objective and subjective attributes (both subjectivity of degree and of sense), and associate different senses of subjective attributes with different users. We demonstrate on both synthetic and real-world data sets that our CAV representation not only accurately interprets users' subjective semantics, but can also be used to improve recommendations through interactive item critiquing. View details
    Building Human Values into Recommender Systems: An Interdisciplinary Synthesis and Open Problems
    Jonathan Stray
    Alon Halevy
    Parisa Assar
    Dylan Hadfield-menell
    Amar Ashar
    Chloe Bakalar
    Lex Beattie
    Michael Ekstrand
    Claire Leibowicz
    Connie Moon Sehat
    Sara Johansen
    Lianne Kerlin
    David Vickrey
    Spandana Singh
    Sanne Vrijenhoek
    Amy Zhang
    Mckane Andrus
    Natali Helberger
    Polina Proutskova
    Tanushree Mitra
    Nina Vasan
    ACM Transactions on Recommender Systems (2023)
    Preview abstract Recommender systems are the algorithms which select, filter, and personalize content across many of the world’s largest platforms and apps. As such, their positive and negative effects on individuals and on societies have been extensively theorized and studied. The overarching question that arises is whether recommender systems align with the values of the individuals and societies that they serve. Addressing this question in a principled fashion requires technical knowledge of recommender design and their practice, but critically depends on insights from diverse fields including social science, ethics, economics, psychology, policy and law. This paper is a multidisciplinary effort to define a common language for addressing questions around human-value alignment for recommender systems, to synthesize the state of practice and insights from different perspectives, and to propose open problems. We propose a set of values that seem most relevant to recommender systems operating across different domains. We look at values from three different perspectives: 1) measurement, which is a key element of operationalizing values, 2) design, reflecting current approaches and open challenges to implementing these values, and 3) policy, the regulatory approaches which could provide appropriate incentives and standards for recommender system operators. View details