Jump to Content
Le Yan

Le Yan

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Regression Compatible Listwise Objectives for Calibrated Ranking with Binary Relevance
    Pratyush Kar
    Bing-Rong Lin
    Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (2023)
    Preview abstract As Learning-to-Rank (LTR) approaches primarily seek to improve ranking quality, their output scores are not scale-calibrated by design. This fundamentally limits LTR usage in score-sensitive applications. Though a simple multi-objective approach that combines a regression and a ranking objective can effectively learn scale-calibrated scores, we argue that the two objectives are not necessarily compatible, which makes the trade-off less ideal for either of them. In this paper, we propose a practical regression compatible ranking (RCR) approach that achieves a better trade-off, where the two ranking and regression components are proved to be mutually aligned. Although the same idea applies to ranking with both binary and graded relevance, we mainly focus on binary labels in this paper. We evaluate the proposed approach on several public LTR benchmarks and show that it consistently achieves either best or competitive result in terms of both regression and ranking metrics, and significantly improves the Pareto frontiers in the context of multi-objective optimization. Furthermore, we evaluated the proposed approach on YouTube Search and found that it not only improved the ranking quality of the production pCTR model, but also brought gains to the click prediction accuracy. The proposed approach has been successfully deployed in the YouTube production system. View details
    Preview abstract The distillation of ranking models has become an important topic in both academia and industry. In recent years, several advanced methods have been proposed to tackle this problem, often leveraging ranking information from teacher rankers that is absent in traditional classification settings. To date, there is no well-established consensus on how to evaluate this class of models. Moreover, inconsistent benchmarking on a wide range of tasks and datasets make it difficult to assess or invigorate advances in this field. This paper first examines representative prior arts on ranking distillation, and raises three questions to be answered around methodology and reproducibility. To that end, we propose a systematic and unified benchmark, Ranking Distillation Suite (RD-Suite), which is a suite of tasks with 4 large realworld datasets, encompassing two major modalities (textual and numeric) and two applications (standard distillation and distillation transfer). RD-Suite consists of benchmark results that challenge some of the common wisdom in the field, and the release of datasets with teacher scores and evaluation scripts for future research. RD-Suite paves the way towards better understanding of ranking distillation, facilities more research in this direction, and presents new challenges. View details
    Preview abstract Unbiased learning to rank (ULTR) studies the problem of mitigating various biases from implicit user feedback data such as clicks, and has been receiving considerable attention recently. A popular ULTR approach for real-world applications uses a two-tower architecture, where click modeling is factorized into a relevance tower with regular input features, and a bias tower with bias-relevant inputs such as the position of a document. A successful factorization will allow the relevance tower to be exempt from biases. In this work, we identify a critical issue that existing ULTR methods ignored - the bias tower can be confounded with the relevance tower via the underlying true relevance. In particular, the positions were determined by the logging policy, i.e., the previous production model, which would possess relevance information. We give both theoretical analysis and empirical results to show the negative effects on relevance tower due to such a correlation. We then propose two methods to mitigate the negative confounding effects by better disentangling relevance and bias. Offline empirical results on both controlled public datasets and a large-scale industry dataset show the effectiveness of the proposed approaches. We conduct a live experiment on a popular web store for four weeks, and find a significant improvement in user clicks over the baseline, which ignores the negative confounding effect. View details
    Revisiting two tower models for unbiased learning to rank
    Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (2022), 2410–2414
    Preview abstract Two-tower architecture (one tower to factorize out position-related bias) has now become a common technique in neural network ranking models for Unbiased Learning To Rank (ULTR). In these models, a neural network tower taking in all position related features is designed to model the biases, which are equivalent to the propensity scores used to define the unbiased ranking metrics. It works based on the assumptions that the user interaction (click) is conditioned on the user observation of a ranked item, and only the observation probability depends on the position. So if we factorize out the observation probability, we can then unbiased rank the items by their click rate conditioned on observation. The assumption appears sensible, and the additive two-tower models based on it have been widely implemented in ULTR. However, two-tower models may not always work and sometimes work even worse than the biased models, as the user may not always follow the same pattern. In this work, we stick to the plausible assumption about the user interaction, but we also consider the spectrum of different user behaviors. In this case, the assumption that the position related observation probability may not be able to get explicitly factorized out. We also study generic methods to treat this complexity and show these methods could outperform the simple additive debias models in offline experiments. View details
    Scale Calibration of Deep Ranking Models
    28TH ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) (2022), pp. 4300-4309
    Preview abstract Learning-to-Rank (LTR) systems are ubiquitous in web applications nowadays. The existing literature mainly focuses on improving ranking performance by trying to generate the optimal order of candidate items. However, virtually all advanced ranking functions are not scale calibrated. For example, rankers have the freedom to add a constant to all item scores without changing their relative order. This property has resulted in several limitations in deploying advanced ranking methods in practice. On the one hand, it limits the use of effective ranking functions in important applications. For example, in ads ranking, predicted Click-Through Rate (pCTR) is used for ranking and is required to be calibrated for the downstream ads auction. This is a major reason that existing ads ranking methods use scale calibrated pointwise loss functions that may sacrifice ranking performance. On the other hand, popular ranking losses are translation-invariant. We rigorously show that, both theoretically and empirically, this property leads to training instability that may cause severe practical issues. In this paper, we study how to perform scale calibration of deep ranking models to address the above concerns. We design three different formulations to calibrate ranking models through calibrated ranking losses. Unlike existing post-processing methods, our calibration is performed during training, which can resolve the training instability issue without any additional processing. We conduct experiments on the standard LTR benchmark datasets and one of the largest sponsored search ads dataset from Google. Our results show that our proposed calibrated ranking losses can achieve nearly optimal results in terms of both ranking quality and score scale calibration. View details
    Preview abstract We explore a novel perspective of knowledge distillation (KD) for learning to rank (LTR), and introduce Self-Distilled neural Rankers (SDR), where student rankers are parameterized identically to their teachers. Unlike the existing ranking distillation work which pursues a good trade-off between performance and efficiency, SDR is able to significantly improve ranking performance of students over the teacher rankers without increasing model capacity. The key success factors of SDR, which differs from common distillation techniques for classification are: (1) an appropriate teacher score transformation function, and (2) a novel listwise distillation framework. Both techniques are specifically designed for ranking problems and are rarely studied in the existing knowledge distillation literature. Building upon the state-of-the-art neural ranking structure, SDR is able to push the limits of neural ranking performance above a recent rigorous benchmark study and significantly outperforms traditionally strong gradient boosted decision tree based models on 7 out of 9 key metrics, the first time in the literature. In addition to the strong empirical results, we give theoretical explanations on why listwise distillation is effective for neural rankers, and provide ablation studies to verify the necessity of the key factors in the SDR framework. View details
    Preview abstract Multiclass classification (MCC) is a fundamental machine learning problem of classifying each instance into one of a predefined set of classes. Given an instance, an MCC model computes a score for each class, all of which are used to sort the classes. The performance of a model is usually measured by Top-K Accuracy/Error (e.g. K=1 or 5). In this paper, we do not aim to propose new neural network architectures as most recent works do, but to show that it is promising to boost MCC performance with a novel formulation through the lens of ranking. In particular, by viewing MCC as \emph{an instance class ranking problem}, we first argue that ranking metrics, such as Normalized Discounted Cumulative Gain, can be more informative than the existing Top-K metrics. We further demonstrate that the dominant neural MCC recipe can be transformed to a neural ranking pipeline. Based on such generalization, we show that it is intuitive to leverage techniques from the rich information retrieval literature to improve the MCC performance out of the box. Extensive empirical results on both text and image classification tasks with diverse datasets and backbone neural models show the value of our proposed framework. View details
    Preview abstract Existing work on search result diversification typically falls into the "next document" paradigm, that is, selecting the next document based on the ones already chosen. A sequential process of selecting documents one-by-one is naturally modeled in learning-based approaches. However, such a process makes the learning difficult because there are an exponential number of ranking lists to consider. Sampling is usually used to reduce the computational complexity but this makes the learning less effective. In this paper, we propose a soft version of the "next document" paradigm in which we associate each document with an approximate rank, and thus the subtopics covered prior to a document can also be estimated. We show that we can derive differentiable diversification-aware losses, which are smooth approximation of diversity metrics like alpha-NDCG, based on these estimates. We further propose to optimize the losses in the learning-to-rank setting using neural distributed representations of queries and documents. Experiments are conducted on the public benchmark TREC datasets. By comparing with an extensive list of baseline methods, we show that our Diversification-Aware LEarning-TO-Rank (DALETOR) approaches outperform them by a large margin, while being much simpler during learning and inference. View details
    Preview abstract Despite the success of neural models in many major machine learning problems and recently published neural learning to rank (LTR) papers in top venues, the effectiveness of neural models on traditional LTR problems is still not widely acknowledged. We first validate the concern by showing that most recent neural LTR models are, by a large margin, inferior to the best publicly available tree-based implementation, which is sometimes ignored in recent neural LTR papers. We then investigate why existing neural LTR suffers by identifying several of its weaknesses. To that end, we propose a new neural LTR framework that mitigates these weaknesses, by borrowing ideas from several research fields. Our models are able to perform comparatively with the strong tree-based baseline, while outperforming recently published neural learning to rank methods by a large margin. Our results also serve as a benchmark for neural learning to rank models. View details
    No Results Found