Lior Shani
Authored Publications
Sort By
Embedding-Aligned Language Models
Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS-24), Vancouver (2024)
Preview abstract
We propose a novel approach for training large language models (LLMs) to adhere to objectives imposed by a latent embedding space. Our method leverages reinforcement learning (RL), treating a pre-trained LLM as an environment. An Embedding-Aligned Guided LanguagE (EAGLE) agent it trained using a significantly smaller language model to iteratively stir the LLM's generation towards optimal regions of a latent embedding space, given some predefined criteria. We demonstrate the effectiveness of the EAGLE agent using the MovieLens 25M dataset, on extrapolation tasks for content gap to satisfy latent user demand, and multi-attribute satisfaction for generating creative variations of entities. Our work paves the way for controlled and grounded text generation using LLMs, ensuring consistency with domain-specific knowledge and data representations.
View details
Reinforcement Learning with History Dependent Dynamic Contexts
Nadav Merlis
Martin Mladenov
Proceedings of the 40th International Conference on Machine Learning (ICML 2023), Honolulu, Hawaii
Preview abstract
We introduce a framework for modeling and solving reinforcement learning problems in non-Markovian, history-dependent environments. Our framework, called the Dynamic Contextual Markov Decision Process (DCMDP), generalizes the contextual MDP framework to handle non-Markov environments where contexts change over time. To overcome the exponential dependence on history, we leverage an aggregated mapping of previous visits to states, actions and contexts to construct an optimistic upper confidence-based algorithm, for which we establish regret bounds. Motivated by our theoretical results, we introduce a practical model-based algorithm that addresses history-dependent contexts, by planing in a latent space and using optimism over history-dependent features. We demonstrate the efficiency and performance of our approach on a recommendation task using the MovieLens dataset, in which the user's behavior is influenced by the agent's recommendations and changes over time.
View details
Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback
Paul Roit
Johan Ferret
Geoffrey Cideron
Matthieu Geist
Sertan Girgin
Léonard Hussenot
Nikola Momchev
Piotr Stanczyk
Nino Vieillard
Olivier Pietquin
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics (2023), 6252–6272
Preview abstract
Despite the seeming success of contemporary grounded text generation systems, they often tend to generate factually inconsistent text with respect to their input. This phenomenon is emphasized in tasks like summarization, in which the generated summaries should be corroborated by their source article. In this work we leverage recent progress on textual entailment models to directly address this problem for abstractive summarization systems. We use reinforcement learning with reference-free, textual-entailment rewards to optimize for factual consistency and explore the ensuing trade-offs, as improved consistency may come at the cost of less informative or more extractive summaries. Our results, according to both automatic metrics and human evaluation, show that our method considerably improves the faithfulness, salience and conciseness of the generated summaries.
View details