Yinlam Chow
Yinlam Chow is a research scientist at Google Research. Prior to Google, he was a research scientist at DeepMind (from 2017 to 2019) and a research scientist at Osaro, Inc (from 2016 to 2017). He received a Ph.D. from Stanford Institute of Computational and Mathematical Engineering (ICME) in 2017. He has published over 30 papers in major machine learning and control journals and conferences. His research focuses have been on deriving algorithms for risk-sensitive, safe, robust control, sequential decision making, and (model-based and model-free) reinforcement learning, with applications to problems in robotics, power systems, and personalized recommendation.
Research Areas
Authored Publications
Sort By
Demystifying Embedding Spaces using Large Language Models
Jihwan Jeong
Lior Shani
Martin Mladenov
The Twelfth International Conference on Learning Representations (2024)
Preview abstract
Embeddings have become a pivotal means to represent complex, multi-faceted information about entities, concepts, and relationships in a condensed and useful format. Nevertheless, they often preclude direct interpretation. While downstream tasks make use of these compressed representations, meaningful interpretation usually requires visualization using dimensionality reduction or specialized machine learning interpretability methods. This paper addresses the challenge of making such embeddings more interpretable and broadly useful, by employing large language models (LLMs) to directly interact with embeddings -- transforming abstract vectors into understandable narratives. By injecting embeddings into LLMs, we enable querying and exploration of complex embedding data. We demonstrate our approach on a variety of diverse tasks, including: enhancing concept activation vectors (CAVs), communicating novel embedded entities, and decoding user preferences in recommender systems. Our work couples the immense information potential of embeddings with the interpretative power of LLMs.
View details
DynaMITE-RL: A Dynamic Model for Improved Temporal Meta Reinforcement Learning
Anthony Liang
Erdem Biyik
Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS-24), Vancouver (2024)
Preview abstract
We introduce a meta-reinforcement learning (meta-RL) approach, called DynaMITE-RL, to perform approximate inference in environments where the latent information evolves slowly between subtrajectories called sessions.
We identify three key modifications to contemporary meta-RL methods: consistency of latent information during sessions, session masking, and prior latent conditioning.
We demonstrate the necessity of these modifications on various downstream applications from discrete Gridworld environments to continuous control and simulated robot assistive tasks and find that our approach significantly outperforms contemporary baselines.
View details
Factual and Personalized Recommendation Language Modeling with Reinforcement Learning
Jihwan Jeong
Mohammad Ghavamzadeh
Proceedings of the First Conference on Language Modeling (COLM-24), Philadelphia (2024)
Preview abstract
Recommender systems (RSs) play a central role in connecting users to products, content and services by matching candidate items to users based on their preferences. While existing RSs often rely on implicit user feedback on recommended items (e.g., clicks, watches, ratings), conversational recommender systems are interacting with users to provide tailored recommendations in natural language. In this work, we aim to develop a recommender language model (LM) that is capable of generating compelling endorsement presentations of relevant items to users, to better explain the details of the items, to connect the items with users’ preferences, and to enhance the likelihood of users accepting recommendations. Specifically, such an LLM-based recommender can understand users’ preferences from users’ RS embeddings summarizing feedback history, output corresponding responses that not only are factually-grounded, but also explain whether these items satisfy users’ preferences in a convincing manner. The pivotal question is how one can gauge the performance of such a LLM recommender. Equipped with a joint reward function that measures factual consistency, convincingness, and personalization, not only can we evaluate the efficacies of different recommender LMs, but we can also utilize this metric as a form of AI feedback to fine-tune our LLM agent via reinforcement learning (RL). Building upon the MovieLens movie recommendation benchmark, we developed a novel conversational recommender delivering personalized movie narratives to users. This work lays the groundwork for recommendation systems that prioritize individualized user experiences without compromising on transparency and integrity.
View details
Discovering Personalized Semantics for Soft Attributes in Recommender Systems using Concept Activation Vectors
Christina Göpfert
Alex Haig
Ivan Vendrov
Tyler Lu
Hubert Pham
Mohammad Ghavamzadeh
ACM Transactions on Recommender Systems (2024)
Preview abstract
Interactive recommender systems have emerged as a promising paradigm to overcome the limitations of the primitive user feedback used by traditional recommender systems (e.g., clicks, item consumption, ratings). They allow users to express intent, preferences, constraints, and contexts in a richer fashion, often using natural language (including faceted search and dialogue).
Yet more research is needed to find the most effective ways to use this feedback. One challenge is inferring a user's semantic intent
from the open-ended terms or attributes often used to describe a desired item,
and using it to refine recommendation results.
Leveraging concept activation vectors (CAVs) (Kim, et al., 2018)
a recently developed approach for model interpretability in machine learning,
we develop a framework to learn a representation that captures the semantics of such attributes and connects them to user preferences and behaviors in recommender systems. One novel feature of our approach is its ability to distinguish objective and subjective attributes (both subjectivity of degree and of sense), and associate different senses of subjective attributes with different users.
We demonstrate on both synthetic and real-world data sets that our CAV representation not only accurately interprets users' subjective semantics, but can also be used to improve recommendations through interactive item critiquing.
View details
Embedding-Aligned Language Models
Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS-24), Vancouver (2024)
Preview abstract
We propose a novel approach for training large language models (LLMs) to adhere to objectives imposed by a latent embedding space. Our method leverages reinforcement learning (RL), treating a pre-trained LLM as an environment. An Embedding-Aligned Guided LanguagE (EAGLE) agent it trained using a significantly smaller language model to iteratively stir the LLM's generation towards optimal regions of a latent embedding space, given some predefined criteria. We demonstrate the effectiveness of the EAGLE agent using the MovieLens 25M dataset, on extrapolation tasks for content gap to satisfy latent user demand, and multi-attribute satisfaction for generating creative variations of entities. Our work paves the way for controlled and grounded text generation using LLMs, ensuring consistency with domain-specific knowledge and data representations.
View details
A Mixture-of-Expert Approach to RL-based Dialogue Management
Ofir Nachum
Dhawal Gupta
Moonkyung Ryu
Mohammad Ghavamzadeh
Proceedings of the Eleventh International Conference on Learning Representations (ICLR-23), Kigali, Rwanda (2023)
Preview abstract
Despite recent advancements in language models (LMs), their application to dialogue management (DM) and ability to carry on rich conversations remains a challenge. We use reinforcement learning (RL) to develop a dialogue agent that avoids being short-sighted (often outputting generic utterances) and maximizes overall user satisfaction. However, existing RL approaches focus on training an agent that operates at the word level. Since generating semantically-correct and sensible utterances from a large vocabulary space is combinatorially complex, RL can struggle to produce engaging dialogue, even if warm-started with a pre-trained LM. To address this issue, we develop a RL-based DM using a novel mixture-of-expert (MoE) approach, which consists of (i) a language representation that captures diverse information, (ii) several modulated LMs (or experts) to generate candidate utterances, and (iii) a RL-based DM that performs dialogue planning with the utterances generated by the experts. This MoE approach provides greater flexibility to generate sensible utterances of different intents, and allows RL to focus on conversational-level DM. We compare it with SOTA baselines on open-domain dialogues and demonstrate its effectiveness both in the diversity and sensibility of the generated utterances as well as the overall DM performance.
View details
Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management
Dhawal Gupta
Mohammad Ghavamzadeh
Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS-23), New Orleans (2023)
Preview abstract
Reinforcement learning (RL) has offered great promise for developing dialogue management (DM) agents that avoid being short-sighted, conduct rich conversations, and maximize overall user satisfaction. Despite recent developments in deep RL and language models (LMs), using RL to power conversational chatbots remain a formidable challenge. This is because deep RL algorithms require online exploration to learn effectively, but collecting fresh human-bot interactions can be expensive and unsafe. This issue is exacerbated by the combinatorial action space that these algorithms need to handle, as most LM agents generate responses at the word-level. Leveraging the recent advances of Mixture-of-Expert Language Models (MoE-LMs) that capture diverse semantics, generate utterances of different intents, and are amenable for multi-turn DM, we develop a gamut of offline RL algorithms that excel in dialogue planning. Through exploiting the MoE-LM structure, our methods significantly reduce the action space and improve the efficacy of RL DM.
We compare that with SOTA methods on open-domain dialogues to demonstrate their effectiveness both in the diversity of generated utterances and the overall DM performance.
View details
Discovering Personalized Semantics for Soft Attributes in Recommender Systems using Concept Activation Vectors
Christina Göpfert
Ivan Vendrov
Tyler Lu
WWW22: The Web Conference 2022, Lyon, France, pp. 2411-2421
Preview abstract
Interactive Recommender Systems (RSs) have emerged as a promising paradigm to overcome the limitations of the primitive user feedback used by traditional RSs (e.g., clicks, item consumption, ratings), allowing users to express intent, preferences, constraints, and contexts in a richer fashion using natural language. Still, more research is needed to find the most effective ways to use this feedback. One major challenge is inferring a user's intended semantic intent from given the open-ended terms (say, attributes or tags) used to describe a desired item, and utilize that to refine recommendation results.
Leveraging Concept Activation Vectors (CAVs) [13], we develop a framework to learn a representation that captures the semantics of such attributes and connect them to user preferences and behaviors in RSs. One novel feature of our approach is its ability to distinguish objective and subjective attributes (including subjectivity of degree and of sense) and associate different senses of subjective attributes with different user. We demonstrate on both synthetic and real-world datasets that our CAV representation not only accurately interprets users' subjective semantics, but can also be used to improve recommendations.
View details
Non-Stationary Off-policy Optimization
Joey Hong
Branislav Kveton
International Conference on Artificial Intelligence and Statistics (AISTATS) (2021)
Preview abstract
Off-policy learning is a framework for estimating the value of and optimizing policies offline from logged data without deploying them. Real-world environments are nonstationary, and the optimized policies should be able to adapt to these changes. To address this challenge, we study the novel problem of off-policy optimization in piecewise-stationary environments. Our key idea is to use a change-point detector to partition the logged data into categorical latent states, then find a near-optimal policy conditioned on latent state. We derive high-probability bounds on our off-policy estimates and optimization. Furthermore, we also propose a practical approach to deploy our policy online and evaluate our approach comprehensively on a real-world clickstream dataset.
View details
Preview abstract
We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning, where the goal is to estimate a confidence interval on a target policy’s value, given only access to a static experience dataset collected by unknown behavior policies. Starting from a function space embedding of the linear program formulation of the Q-function, we obtain an optimization problem with generalized estimating equation constraints. By applying the generalized empirical likelihood method to the resulting Lagrangian, we propose CoinDICE, a novel and efficient algorithm for computing confidence intervals. Theoretically, we prove the obtained confidence intervals are valid, in both asymptotic and finite-sample regimes. Empirically, we show in a variety of benchmarks that the confidence interval estimates are tighter and more accurate than existing methods
View details