Li Yang
Authored Publications
Sort By
Minimizing Live Experiments in Recommender Systems: User Simulation to Evaluate Preference Elicitation Policies
Martin Mladenov
James Pine
Hubert Pham
Shane Li
Xujian Liang
Anton Polishko
Ben Scheetz
Proceedings of he 47th International ACM/SIGIR Conference on Research and Development in Information Retrieval (SIGIR-24), Washington, DC (2024), pp. 2925-2929
Preview abstract
Evaluation of policies in recommender systems (RSs) typically involves A/B testing using live experiments on real users to assess a new policy's impact on relevant metrics. This ``gold standard'' comes at a high cost, however, in terms of cycle time, user cost, and potential user retention. In developing policies for onboarding new users, these costs can be especially problematic, since on-boarding occurs only once. In this work, we describe a simulation methodology used to augment (and reduce) the use of live experiments. We illustrate its deployment for the evaluation of preference elicitation algorithms used to onboard new users of the YouTube Music platform. By developing counterfactually robust user behavior models, and a simulation service that couples such models with production infrastructure, we are able to test new algorithms in a way that reliably predicts their performance on key metrics when deployed live, sometimes more reliably than live experiments due to the scale at which simulation can be realized. We describe our domain, our simulation models and platform, results of experiments and deployment, and suggest future steps needed to further realistic simulation as a powerful complement to live experiments.
View details
MAVE: A Product Dataset for Multi-source Attribute Value Extraction
Qifan Wang
Anand Kulkarni
Bin Shu
Jon Elsas
WSDM 2022 (2022)
Preview abstract
Attribute value extraction refers to the task of identifying values of an attribute of interest from product information. Product attribute values are essential in many e-commerce scenarios, such as customer service robots, product ranking, retrieval and recommendations. While in the real world, the attribute values of a product are usually incomplete and vary over time, which greatly hinders the practical applications.
In this paper, we introduce MAVE, a new dataset to better facilitate research on product attribute value extraction. MAVE is composed of a curated set of 2.2 million products from Amazon pages, with 3 million attribute-value annotations across 1257 unique categories. MAVE has four main and unique advantages: First, MAVE is the largest product attribute value extraction dataset by the number of attribute-value examples by 8x. Second, MAVE includes multi-source representations from the product, which captures the full product information with high attribute coverage. Third, MAVE represents a more diverse set of attributes and values relative to what previous datasets cover.
Lastly, MAVE provides a very challenging zero-shot test set, as we empirically illustrate in the experiments. We further propose a novel approach that effectively extracts the attribute value from the multi-source product information. We conduct extensive experiments with several baselines and show that MAVE is challenging for the latest attribute value extraction models, especially on zero-shot attribute extraction.
View details
ETC: Encoding Long and Structured Inputs in Transformers
Anirudh Ravula
Joshua Ainslie
Qifan Wang
Vaclav Cvicek
2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020)
Preview abstract
Transformer models have advanced the state of the art in many NLP tasks. In this paper, we present a new Transformer architecture, Extended Transformer Construction (ETC), that addresses two key limitations of existing architectures, namely: scaling input length, and ingesting structured inputs. The main innovation is a new global-local attention mechanism between a global memory and the input tokens, which allows scaling attention to longer inputs. We show that combining global-local attention with relative position encodings and a Contrastive Predictive Coding (CPC) pre-training task allows ETC to naturally handle structured data. We achieve new state-of-the-art results on two natural language datasets requiring long and/or structured inputs.
View details
Learning to Extract Attribute Value from Product via Question Answering: A Multi-task Approach
Qifan Wang
D. Sivakumar
Bin Shu
Jon Elsas
SIGKDD 2020 (2020)
Preview abstract
Attribute value extraction refers to the task of identifying values of an attribute of interest from product information.
It is an important research topic which has been widely studied in e-Commerce and relation learning.
There are two main limitations in existing attribute value extraction methods: scalability and generalizability.
Most existing methods treat each attribute independently and build separate models for each of them, which are not suitable for large scale attribute systems in real-world applications.
Moreover, very limited research has focused on generalizing extraction to new attributes.
In this work, we propose a novel approach for Attribute Value Extraction via Question Answering (AVEQA) using a multi-task framework.
In particular, we build a question answering model which treats each attribute as a question and identifies the answer span corresponding to the attribute value in the product context.
A unique BERT contextual encoder is adopted and shared across all attributes to encode both the context and the question, which makes the model scalable.
A distilled masked language model with knowledge distillation loss is introduced to improve the model generalization ability. In addition, we employ a no-answer classifier to explicitly handle the cases where there are no values for a given attribute in the product context.
The question answering, distilled masked language model and the no answer classification are then combined into a unified multi-task framework.
We conduct extensive experiments on a public dataset. The results demonstrate that the proposed approach outperforms several state-of-the-art methods with large margin.
View details
Preview abstract
Deep neural networks have been shown as a potentially powerful ansatz in variational Monte Carlo for solving quantum many-body problems. We propose two improvements in this direction. The first is graph neural ansatz (GNA), which is a variational wavefunction universal to arbitrary geometry. GNA results in accurate ground-state energies on 2D Kagome lattices, triangular lattices, and randomly connected graphs. Secondly, we design a distributed workflow on multiple accelerators to scale up the computation. We compute Kagome lattices with sizes up to 432 sites on 128 TPU cores. The parameter sharing nature of the GNA also leads to transferability across different system sizes and geometries.
View details
Big Bird: Transformers for Longer Sequences
Guru Prashanth Guruganesh
Joshua Ainslie
Anirudh Ravula
Qifan Wang
NeurIPS (2020)
Preview abstract
Transformers-based models, such as BERT, have been one of the most successful deep learning models for NLP. Unfortunately, one of their core limitations is the quadratic dependency (in terms of memory mainly) on the sequence length due to their full attention mechanism. To remedy this, we propose, \emph{BigBird}, a sparse attention mechanism that reduces this quadratic dependency to linear. We show that \emph{BigBird} is a universal approximator of sequence functions and is Turing complete, thereby preserving these properties of the quadratic, full attention model. Along the way, our theoretical analysis demonstrates the need for having an O(1) global tokens, such as CLS, that attend to the entire sequence as part of the sparse attentions. We show that the proposed sparse attention can handle sequences of length up to 8x of what was previously possible using similar hardware. As a consequence of the capability to handle longer context, \emph{BigBird} drastically improves performance on various NLP tasks such as question answering.
View details