Aza Tulepbergenov
Interested in using ML/AI to do something useful. Background in RL, NLP and Software Engineering.
Research Areas
Authored Publications
Sort By
Factual and Personalized Recommendation Language Modeling with Reinforcement Learning
Jihwan Jeong
Mohammad Ghavamzadeh
Proceedings of the First Conference on Language Modeling (COLM-24), Philadelphia (2024)
Preview abstract
Recommender systems (RSs) play a central role in connecting users to products, content and services by matching candidate items to users based on their preferences. While existing RSs often rely on implicit user feedback on recommended items (e.g., clicks, watches, ratings), conversational recommender systems are interacting with users to provide tailored recommendations in natural language. In this work, we aim to develop a recommender language model (LM) that is capable of generating compelling endorsement presentations of relevant items to users, to better explain the details of the items, to connect the items with users’ preferences, and to enhance the likelihood of users accepting recommendations. Specifically, such an LLM-based recommender can understand users’ preferences from users’ RS embeddings summarizing feedback history, output corresponding responses that not only are factually-grounded, but also explain whether these items satisfy users’ preferences in a convincing manner. The pivotal question is how one can gauge the performance of such a LLM recommender. Equipped with a joint reward function that measures factual consistency, convincingness, and personalization, not only can we evaluate the efficacies of different recommender LMs, but we can also utilize this metric as a form of AI feedback to fine-tune our LLM agent via reinforcement learning (RL). Building upon the MovieLens movie recommendation benchmark, we developed a novel conversational recommender delivering personalized movie narratives to users. This work lays the groundwork for recommendation systems that prioritize individualized user experiences without compromising on transparency and integrity.
View details
Demystifying Embedding Spaces using Large Language Models
Jihwan Jeong
Lior Shani
Martin Mladenov
The Twelfth International Conference on Learning Representations (2024)
Preview abstract
Embeddings have become a pivotal means to represent complex, multi-faceted information about entities, concepts, and relationships in a condensed and useful format. Nevertheless, they often preclude direct interpretation. While downstream tasks make use of these compressed representations, meaningful interpretation usually requires visualization using dimensionality reduction or specialized machine learning interpretability methods. This paper addresses the challenge of making such embeddings more interpretable and broadly useful, by employing large language models (LLMs) to directly interact with embeddings -- transforming abstract vectors into understandable narratives. By injecting embeddings into LLMs, we enable querying and exploration of complex embedding data. We demonstrate our approach on a variety of diverse tasks, including: enhancing concept activation vectors (CAVs), communicating novel embedded entities, and decoding user preferences in recommender systems. Our work couples the immense information potential of embeddings with the interpretative power of LLMs.
View details
Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management
Dhawal Gupta
Mohammad Ghavamzadeh
Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS-23), New Orleans (2023)
Preview abstract
Reinforcement learning (RL) has offered great promise for developing dialogue management (DM) agents that avoid being short-sighted, conduct rich conversations, and maximize overall user satisfaction. Despite recent developments in deep RL and language models (LMs), using RL to power conversational chatbots remain a formidable challenge. This is because deep RL algorithms require online exploration to learn effectively, but collecting fresh human-bot interactions can be expensive and unsafe. This issue is exacerbated by the combinatorial action space that these algorithms need to handle, as most LM agents generate responses at the word-level. Leveraging the recent advances of Mixture-of-Expert Language Models (MoE-LMs) that capture diverse semantics, generate utterances of different intents, and are amenable for multi-turn DM, we develop a gamut of offline RL algorithms that excel in dialogue planning. Through exploiting the MoE-LM structure, our methods significantly reduce the action space and improve the efficacy of RL DM.
We compare that with SOTA methods on open-domain dialogues to demonstrate their effectiveness both in the diversity of generated utterances and the overall DM performance.
View details
A Mixture-of-Expert Approach to RL-based Dialogue Management
Ofir Nachum
Dhawal Gupta
Moonkyung Ryu
Mohammad Ghavamzadeh
Proceedings of the Eleventh International Conference on Learning Representations (ICLR-23), Kigali, Rwanda (2023)
Preview abstract
Despite recent advancements in language models (LMs), their application to dialogue management (DM) and ability to carry on rich conversations remains a challenge. We use reinforcement learning (RL) to develop a dialogue agent that avoids being short-sighted (often outputting generic utterances) and maximizes overall user satisfaction. However, existing RL approaches focus on training an agent that operates at the word level. Since generating semantically-correct and sensible utterances from a large vocabulary space is combinatorially complex, RL can struggle to produce engaging dialogue, even if warm-started with a pre-trained LM. To address this issue, we develop a RL-based DM using a novel mixture-of-expert (MoE) approach, which consists of (i) a language representation that captures diverse information, (ii) several modulated LMs (or experts) to generate candidate utterances, and (iii) a RL-based DM that performs dialogue planning with the utterances generated by the experts. This MoE approach provides greater flexibility to generate sensible utterances of different intents, and allows RL to focus on conversational-level DM. We compare it with SOTA baselines on open-domain dialogues and demonstrate its effectiveness both in the diversity and sensibility of the generated utterances as well as the overall DM performance.
View details