Tianqi Liu
Research Areas
Authored Publications
Sort By
LiPO: Listwise Preference Optimization through Learning-to-Rank
Misha Khalman
Yao Zhao
Jialu Liu
Peter Liu
NAACL (2025)
Preview abstract
Aligning language models (LMs) with curated human feedback is critical to control their behaviors in real-world applications. Several recent policy optimization methods, such as DPO and SLiC, serve as promising alternatives to the traditional Reinforcement Learning from Human Feedback (RLHF) approach. In practice, human feedback often comes in a format of a ranked list over multiple responses to amortize the cost of reading prompt. Multiple responses can also be ranked by reward models or AI feedback. There lacks such a thorough study on directly fitting upon a list of responses. In this work, we formulate the LM alignment as a listwise ranking problem and describe the LiPO framework, where the policy can potentially learn more effectively from a ranked list of plausible responses given the prompt. This view draws an explicit connection to Learning-to-Rank (LTR), where most existing preference optimization work can be mapped to existing ranking objectives. Following this connection, we provide an examination of ranking objectives that are not well studied for LM alignment, with DPO and SLiC as special cases when list size is two. In particular, we highlight a specific method, LiPO-𝜆, which leverages a state-of-the-art listwise ranking objective and weights each preference pair in a more advanced manner. We show that LiPO-𝜆 can outperform DPO variants and SLiC by a clear margin on several preference alignment tasks with both curated and real rankwise preference data.
View details
RMBoost: Reward Model Training With Preference-Conditional Multi-Aspect Synthetic Data Generation
Yennie Jun
Carl Yang
Michael Bendersky
Ran Xu
2025
Preview abstract
Reward models (RMs) are crucial for aligning large language models (LLMs) with human preferences. They are trained using preference datasets where each example consists of one input prompt, two responses, and a preference label. As curating a high-quality human labeled preference dataset is both time-consuming and expensive, people often rely on existing powerful LLMs for preference label generation. This can potentially introduce noise and impede RM training. In this work, we present RMBoost, a novel synthetic preference data generation paradigm to boost reward model quality. The core idea of RMBoost is to first select a preference label and then directly generates the second more (or less) preferred response conditioned this preference label. Compared to traditional approaches where we first generate two responses and then obtain the preference label, RMBoost has two main advantages. First, RMBoost reduces labeling noise since preference pairs are constructed intentionally. Second, RMBoost allows for the creation of more diverse responses by incorporating various quality aspects (e.g., helpfulness, relevance, completeness) into the prompts We conduct extensive experiments on three diverse datasets and demonstrate that RMBoost outperforms other synthetic preference data generation techniques and significantly boosts the performance of five distinct reward models.
View details
Preference Distillation: Distilling Large Language Models with Teacher-Student Preference Pairs
Rongzhi Zhang
Feng Han
Chao Zhang
Michael Bendersky
Haorui Wang
Jialu Liu
2024
Preview abstract
Large Language Models (LLMs) have exhibited impressive capabilities across diverse range of tasks, yet their enormous parameter spaces present challenges in resource-constrained environments. Knowledge distillation (KD) offers a viable solution by transferring expertise from large teacher models to compact student models. Traditional methods like KL divergence can falter due to the student model's restricted expressivity, and relying solely on single teacher outputs may result in poorly calibrated student models. In this work, we propose a novel LLM distillation approach that leverages teacher-student preference pairs, steering the student's focus towards understanding the relative quality of outputs instead of merely replicating teacher outputs. This method offers dual benefits: it addresses the limitations of student model expressivity and improves sequence ranking calibration, thereby facilitating a more efficient knowledge transfer from teacher to student models. Extensive experiments on sequence generation tasks validate the effectiveness of our approach.
View details
Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting
Jialu Liu
Michael Bendersky
Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) (2024)
Preview abstract
Ranking documents using Large Language Models (LLMs) by directly feeding the query and candidate documents into the prompt is an interesting and practical problem. However, researchers have found it difficult to outperform fine-tuned baseline rankers on benchmark datasets. We analyze pointwise and listwise ranking prompts used by existing methods and argue that off-the-shelf LLMs do not fully understand these challenging ranking formulations. In this paper, we propose to significantly reduce the burden on LLMs by using a new technique called Pairwise Ranking Prompting (PRP). Our results are the first in the literature to achieve state-of-the-art ranking performance on standard benchmarks using moderate-sized open-sourced LLMs. On TREC-DL 2019&2020, PRP based on the Flan-UL2 model with 20B parameters performs favorably with the previous best approach in the literature, which is based on the blackbox commercial GPT-4 that has 50x (estimated) model size, while outperforming other LLM-based solutions, such as InstructGPT which has 175B parameters, by over 10% for all ranking metrics. By using the same prompt template on seven BEIR tasks, PRP outperforms supervised baselines and outperforms the blackbox commercial ChatGPT solution by 4.2% and pointwise LLM-based solutions by more than 10% on average NDCG@10. Furthermore, we propose several variants of PRP to improve efficiency and show that it is possible to achieve competitive results even with linear complexity.
View details
Knowledge Distillation with Perturbed Loss: From a Vanilla Teacher to a Proxy Teacher
Rongzhi Zhang
Jialu Liu
Michael Bendersky
Chao Zhang
Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2024), ACM, pp. 4278 - 4289
Preview abstract
Knowledge distillation is a popular technique to transfer knowledge from a large teacher model to a small student model. Typically, the student learns to imitate the teacher by minimizing the KL divergence of its output distribution with the teacher's output distribution. In this work, we argue that such a learning objective is sub-optimal because there exists a discrepancy between the teacher's output distribution and the ground truth label distribution. Therefore, forcing the student to blindly imitate the unreliable teacher output distribution leads to inferior performance. To this end, we propose a novel knowledge distillation objective PTLoss by first representing the vanilla KL-based distillation loss function via a Maclaurin series and then perturbing the leading-order terms in this series. This perturbed loss implicitly transforms the original teacher into a proxy teacher with a distribution closer to the ground truth distribution. We establish the theoretical connection between this "distribution closeness'' and the student model generalizability, which enables us to select the PTLoss's perturbation coefficients in a principled way. Extensive experiments on six public benchmark datasets demonstrate the effectiveness of PTLoss with teachers of different scales.
View details
Preview abstract
Multi-Task Learning (MTL) models have shown their robustness, effectiveness, and efficiency for transferring learned knowledge across tasks. In real industrial applications such as web content classification, multiple classification tasks are predicted from the same input text such as a web article. However, at the serving time, the existing multitask transformer models such as prompt or adaptor based approaches need to conduct N forward passes for N tasks with O(N) computation cost. To tackle this problem, we propose a scalable method that can achieve stronger performance with close to O(1) computation cost via only one forward pass. To illustrate real application usage, we release a multitask dataset on news topic and style classification. Our experiments show that our proposed method outperforms strong baselines on both the GLUE benchmark and our news dataset.
View details
Preview abstract
Effectively modeling text-rich fresh content such as news articles
and blog posts is a challenging problem. To ensure a content-based
model generalize well to a broad range of applications, it is critical
to have a training dataset that is large beyond the scale of human
labels while achieving desired quality. In this work, we addressing those two challenges by proposing a novel approach to mine
semantically-relevant fresh documents, and their topic labels, with
little human supervision. Specifically, we design a multitask model
that alternate trains a contrasting learning with a multi-label classification to derive an universal document encoder. We show that
this approach can provide billions of high quality organic training examples and can be naturally extended to multilingual setting
where texts in different languages are encoded in the same semantic
space. We experimentally demonstrate NewsEmbed’s competitive
performance across multiple natural language understanding tasks,
both supervised and unsupervised.
View details
Preview abstract
Pre-trained text encoders such as BERT and its variants have recently achieved state-of-the-art performances on many NLP tasks. While being effective, these pre-training methods typically demand massive computation resources. To accelerate pre-training, ELECTRA trains a discriminator that predicts whether each input token is replaced by a generator. However, this new task, as a binary classification, is less semantically informative. In this study, we present a new text encoder pre-training method that improves ELECTRA based on multi-task learning. Specifically, we train the discriminator to simultaneously detect replaced tokens and select original tokens from candidate sets. We further develop two techniques to effectively combine all pre-training tasks: (1) using attention-based networks for task-specific heads, and (2) sharing bottom layers of the generator and the discriminator. Extensive experiments on GLUE and SQuAD datasets demonstrate both the effectiveness and the efficiency of our proposed method.
View details
Preview abstract
Accurate predictions of customers' future lifetime value (LTV) given their attributes and past purchase behavior enables a more customer-centric marketing strategy. Marketers can segment customers into various buckets based on the predicted LTV and, in turn, customize marketing messages or advertising copies to serve customers in different segments better. Furthermore, LTV predictions can directly inform marketing budget allocations and improve real-time targeting and bidding of ad impressions.
One challenge of LTV modeling is that some customers never come back, and the distribution of LTV can be heavy-tailed. The commonly used mean squared error (MSE) loss does not accommodate the significant fraction of zero value LTV from one-time purchasers and can be sensitive to extremely large LTV's from top spenders. In this article, we model the distribution of LTV given associated features as a mixture of zero point mass and lognormal distribution, which we refer to as the zero-inflated lognormal (ZILN) distribution. This modeling approach allows us to capture the churn probability and account for the heavy-tailedness nature of LTV at the same time. It also yields straightforward uncertainty quantification of the point prediction. The ZILN loss can be used in both linear models and deep neural networks (DNN). For model evaluation, we recommend the normalized Gini coefficient to quantify model discrimination and decile charts to assess model calibration. Empirically, we demonstrate the predictive performance of our proposed model on two real-world public datasets.
View details