Minmin Chen

Minmin Chen

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Learning to Augment for Casual User Recommendation
    Elaine Le
    Jianling Wang
    Yuyan Wang
    The ACM Web Conference 2022 (2022)
    Preview abstract Users who come to recommendation platforms are heterogeneous in activity levels. There usually exists a group of core users who visit the platform regularly and consume a large body of contents upon each visit, while others are casual users who tend to visit the platform occasionally and consume less each time. As a result, consumption activities from core users often dominate the training data used for learning. As core users can exhibit different activity patterns from casual users, recommender systems trained on historical user activity data usually achieve much worse performance on casual users than core users. To bridge the gap, we propose a model-agnostic framework \textit{L2Aug} to improve recommendations for casual users through data augmentation, without sacrificing core user experience. \textit{L2Aug} is powered by a data augmentor that learns to generate augmented interaction sequences, in order to fine-tune and optimize the performance of the recommendation system for casual users. On four real-world public datasets, the proposed \textit{L2Aug} outperforms other treatment methods and achieves the best sequential recommendation performance for both casual and core users. We also test \textit{L2Aug} in an online simulation environment with real-time feedback to further validate its efficacy, and showcase its flexibility in supporting different augmentation actions. View details
    Surrogate for Long-Term User Experience in Recommender Systems
    Can Xu
    Lisa Mijung Chung
    Mohit Sharma
    Qian Sun
    Sriraj Badam
    Yuyan Wang
    KDD 2022 (2022)
    Preview abstract Over the years we have seen recommender systems shifting focus from optimizing short-term engagement toward improving long-term user experience on the platforms. While defining good long-term user experience is still an active research area, we focus on one specific aspect of improved long-term user experience here, which is user revisiting the platform. These long term outcomes however are much harder to optimize due to the sparsity in observing these events and low signal-to-noise ratio (weak connection) between these long-term outcomes and a single recommendation. To address these challenges, we propose to establish the association between these long-term outcomes and a set of more immediate term user behavior signals that can serve as surrogates for optimization. To this end, we conduct a large-scale study of user behavior logs on one of the largest industrial recommendation platforms serving billions of users. We study a broad set of sequential user behavior patterns and standardize a procedure to pinpoint the subset that has strong predictive power of the change in users' long-term visiting frequency. Specifically, they are predictive of users' increased visiting to the platform in $5$ months among the group of users with the same visiting frequency to begin with. We validate the identified subset of user behaviors by incorporating them as reward surrogates for long-term user experience in a reinforcement learning (RL) based recommender. Results from multiple live experiments on the industrial recommendation platform demonstrate the effectiveness of the proposed set of surrogates in improving long-term user experience. View details
    Reward Shaping for User Satisfaction in a REINFORCE Recommender
    Can Xu
    Sriraj Badam
    Trevor Potter
    Daniel Li
    Hao Wan
    Elaine Le
    Chris Berg
    Eric Bencomo Dixon
    (2021)
    Preview abstract How might we design Reinforcement Learning (RL)-based recommenders that encourage aligning user trajectories with the underlying user satisfaction? Three research questions are key: (1) measuring user satisfaction, (2) combatting sparsity of satisfaction signals, and (3) adapting the training of the recommender agent to maximize satisfaction. For measurement, it has been found that surveys explicitly asking users to rate their experience with consumed items can provide valuable orthogonal information to the engagement/interaction data, acting as a proxy to the underlying user satisfaction. For sparsity, i.e, only being able to observe how satisfied users are with a tiny fraction of user-item interactions, imputation models can be useful in predicting satisfaction level for all items users have consumed. For learning satisfying recommender policies, we postulate that reward shaping in RL recommender agents is powerful for driving satisfying user experiences. Putting everything together, we propose to jointly learn a policy network and a satisfaction imputation network: The role of the imputation network is to learn which actions are satisfying to the user; while the policy network, built on top of REINFORCE, decides which items to recommend, with the reward utilizing the imputed satisfaction. We use both offline analysis and live experiments in an industrial large-scale recommendation platform to demonstrate the promise of our approach for satisfying user experiences. View details
    Preview abstract Most existing recommender systems primarily focus on the users (content consumers), matching users with the most relevant contents, with the goal of maximizing user satisfaction on the platform. However, given that content providers are playing an increasingly critical role through content creation, largely determining the content pool available for recommendation, a natural question that arises is: Can we design recommenders taking into account utilities of both users and content providers? By doing so, we hope to sustain the flourish of more content providers and a diverse content pool for long-term user satisfaction. Understanding the full impact of recommendations on both user and content provider groups is challenging. This paper aims to serve as a research investigation on one approach toward building a content provider-aware recommender, and evaluating its impact under a simulated setup. To characterize the users-recommender-providers interdependence, we complement user modeling by formalizing provider dynamics as a parallel Markov Decision Process of partially observable states transited by recommender actions and user feedback. We then build a REINFORCE recommender agent, coined EcoAgent, to optimize a joint objective of user utility and the counterfactual utility lift of the content provider associated with the chosen content, which we show to be equivalent to maximizing overall user utility and utilities of all content providers on the platform. To evaluate our approach, we also introduce a simulation environment capturing the key interactions among users, providers, and the recommender. We offer a number of simulated experiments that shed light to both the benefits and the limitations of our approach. These results serve to understand how and when a content-provider aware recommender agent is of benefit in building multi-stakeholder recommender systems. View details
    Values of Exploration in Recommender Systems
    Can Xu
    Elaine Le
    Mohit Sharma
    Su-Lin Wu
    Yuyan Wang
    RecSys (2021)
    Preview abstract Reinforcement Learning (RL) has been sought after to bring next-generation recommender systems to improve user experience on recommendation platforms. While the exploration-exploitation tradeoff is the foundation of RL research, the value of exploration in RL based recommender systems is less well understood. Exploration, commonly seen as a tool to reduce model uncertainty in regions with sparse user interaction/feedback, is believed to cost user experience in the short term while the indirect benefit of better model quality arrives at a later time. We on the other hand argue that recommender systems have inherent needs for exploration and exploration can improve user experience even in the more imminent term. We focus on understanding the role of exploration in changing different facets of recommendation quality that more directly impact user experience. To do that, we introduce a series of methods inspired by exploration research to increase exploration in a RL based recommender system, and study their effect on the end recommendation quality, more specifically, \emph{accuracy, diversity, novelty and serendipity}. We propose a set of metrics to measure RL based recommender systems in these four aspects and evaluate the impact of exploration induced methods against these metrics. In addition to the offline measurements, we conduct live experiments on an industrial recommendation platform serving billions of users to showcase the benefit of exploration. Moreover, we use user conversion as an indicator of the holistic long-term user experience and study the values of exploration in helping platforms convert users. Connecting the offline analyses and live experiments, we start building the connections between these four facets of recommendation quality toward long term user experience and identify serendipity as a desirable recommendation quality that changes user states and improves long term user experience. View details
    Deconfounding User Satisfaction Estimation from Response Rate Bias
    Madeleine Traverse
    Trevor Potter
    Emma Marriott
    Daniel Li
    Chris Haulk
    Proceedings of the 14th ACM Conference on Recommender Systems (2020)
    Preview abstract Improving user satisfaction is at the forefront of industrial recommender systems. While significant progress in recommender systems has relied on utilizing logged implicit data of user-item interactions (i.e., clicks, dwell/watch time, and other user engagement signals), there has been a recent surge of interest in measuring and modeling user satisfaction, as provided by orthogonal data sources. Such data sources typically originate from responses to user satisfaction surveys, which are explicitly asking users to rate their experience with the system and/or specific items they have consumed in the recent past. This data can be valuable for measuring and modeling the degree to which a user has had a satisfactory experience with the recommender, since what users do (engagement) does not always align with what users say they want (satisfaction as measured by surveys). We focus on a large-scale industrial system trained on user survey responses to predict user satisfaction. The predictions of the satisfaction model for each user-item pair, combined with the predictions of the other models (e.g., engagement-focused ones), are fed into the ranking component of a real-world recommender system in deciding items to present to the user. It is therefore imperative that the satisfaction model does an equally good job on imputing user satisfaction across slices of users and items, as it would directly impact which items a user is exposed to. However, the data used for training satisfaction models is specifically biased in that users are more likely to respond to a survey when they will respond that they are more satisfied. When the satisfaction survey responses in slices of data with high response rate follow a different distribution than those with low response rate, response rate becomes a confounding factor for user satisfaction estimation. We find a positive correlation between response rate and ratings in a large-scale survey dataset collected in our case study. To address this inherent response rate bias in the satisfaction data, we propose an inverse propensity weighting approach within a multi-task learning framework. We extend a simple feed-forward neural network architecture predicting user satisfaction to a shared-bottom multi-task learning architecture with two tasks: the user satisfaction estimation task, and the response rate estimation task. We concurrently train these two tasks, and use the inverse of the predictions of the response rate task as loss weights for the satisfaction task to address the response rate bias. We showcase that by doing this, (i) we can accurately model whether a user will respond to a survey, (ii) we improve the user satisfaction estimation error for the data slices with lower propensity to respond while not hurting that of the slices with higher propensity to respond, and (iii) we demonstrate in live A/B experiments that applying the resulting satisfaction predictions from this approach to rank recommendations translates to higher user satisfaction. View details
    Preview abstract Characterizing temporal dependence patterns is a critical step in understanding the statistical properties of sequential data. Long Range Dependence (LRD), referring to long-range correlations decaying as a power law rather than exponentially w.r.t. distance, demands a different set of tools for modeling the underlying dynamics of the sequential data. While it has been widely conjectured that LRD is present in language modeling and sequential recommendation, the amount of LRD in the corresponding sequential datasets has not yet been quantified in a scalable and model-independent manner. We propose a principled estimation procedure of LRD in sequential datasets based on established LRD theory for real-valued time series and apply it to sequences of symbols with million-item-scale dictionaries. In our measurements, the procedure estimates reliably the LRD in the behavior of users as they write Wikipedia articles and as they interact with Youtube. We further show that measuring LRD better informs modeling decisions in particular for RNNs whose ability to capture LRD is still an active area of research. The quantitative measure of LRD informs new Evolutive Recurrent Neural Networks (EvolutiveRNNs) designs, leading to state-of-the-art results on language understanding and sequential recommendation tasks at a fraction of the computational cost. View details
    Top-K Off-Policy Correction for a REINFORCE Recommender System
    Alex Beutel
    Paul Covington
    Sagar Jain
    Francois Belletti
    ACM International Conference on Web Search and Data Mining (WSDM) (2019)
    Preview abstract Industrial recommender systems deal with extremely large action spaces – many millions of items to recommend. Moreover, they need to serve billions of users, who are unique at any point in time, making a complex user state space. Luckily, huge quantities of logged implicit feedback (e.g., user clicks, dwell time) are available for learning. Learning from the logged feedback is however subject to biases caused by only observing feedback on recommendations selected by the previous versions of the recommender. In this work, we present a general recipe of addressing such biases in a production top-K recommender system at YouTube, built with a policy-gradient-based algorithm, i.e. REINFORCE [48]. The contributions of the paper are: (1) scaling REINFORCE to a production recommender system with an action space on the orders of millions; (2) applying off-policy correction to address data biases in learning from logged feedback collected from multiple behavior policies; (3) proposing a novel top-K off-policy correction to account for our policy recommending multiple items at a time; (4) showcasing the value of exploration. We demonstrate the efficacy of our approaches through a series of simulations and multiple live experiments on YouTube. View details
    Preview abstract Understanding temporal dynamics has proved to be highly valuable for accurate recommendation. Sequential recommenders have been successful in modeling the dynamics of users and items over time. However, while different model architectures excel at capturing various temporal ranges or dynamics, distinct application contexts require adapting to diverse behaviors. In this paper we examine how to build a model that can make use of different temporal ranges and dynamics depending on the request context. We begin with the analysis of an anonymized Youtube dataset comprising millions of user sequences. We quantify the degree of long-range dependence in these sequences and demonstrate that both short-term and long-term dependent behavioral patterns co-exist. We then propose a neural Multi-temporalrange Mixture Model (M3) as a tailored solution to deal with both short-term and long-term dependencies. Our approach employs a mixture of models, each with a different temporal range. These models are combined by a learned gating mechanism capable of exerting different model combinations given different contextual information. In empirical evaluations on a public dataset and our own anonymized YouTube dataset, M3 consistently outperforms state-of-the-art sequential recommendation methods. View details
    Preview abstract Recurrent neural networks have gained widespread use in modeling sequential data. Learning long-term dependencies using these models remains difficult though, due to exploding or vanishing gradients. In this paper, we draw connections between recurrent networks and ordinary differential equations. A special form of recurrent networks called the AntisymmetricRNN is proposed under this theoretical framework, which is able to capture long-term dependencies thanks to the stability property of its underlying differential equation. Existing approaches to improving RNN trainability often incur significant computation overhead. In comparison, AntisymmetricRNN achieves the same goal by design. We showcase the advantage of this new architecture through extensive simulations and experiments. AntisymmetricRNN exhibits much more predictable dynamics. It outperforms regular LSTM models on tasks requiring long-term memory and matches the performance on tasks where short-term dependencies dominate despite being much simpler. View details