Jump to Content
Zhe Chen

Zhe Chen

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract In recent years, various deep neural network (DNN) models led to stellar performance in various domains. However, ML practitioners and researchers have observed severe reproducibility issues on DNN models. That is, a set of DNN models trained on the same data with exactly the same architecture may lead to quite different predictions. A common remedy is to use the ensemble method to quantify the prediction variations and improve model reproducibility. However, the ensemble method makes multiple predictions given an input, and is computationally expensive especially serving web-scale traffic at inference time. In this paper, we seek to advance our understanding of prediction variation. We demonstrate that we are able to use neuron activation strength to infer prediction variation. Through empirical experiments on two widely used benchmark datasets Movielens and Criteo, we observed that prediction variations do come from various different sources with randomness, including training data shuffling, and model and embedding parameter random initialization. By adding more randomness sources into model training, we noticed that the ensemble method tends to produce more accurate predictions with higher prediction variations. Last but not least, we demonstrate that neuron activation strength has strong prediction power to infer the ensemble prediction variation. Our approach provides a cheap and simple way for prediction variation estimation, which sets up the foundation and opens up new opportunities for future work on many interesting areas (e.g., model-based reinforcement learning, and active learning) without having to relying on expensive ensemble models. View details
    Multitask Mixture of Sequential Experts for User Activity Streams
    Yicheng Cheng
    Zhe Zhao
    Jingzheng Qin
    26TH ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) (2020)
    Preview abstract Multi-task deep learning has been an actively researched topic, and it has been used in many real-world systems for user activities and content recommendation. While most of the multi-task model architectures proposed to date focus on using non-sequential input features (e,g. query and context), input data is often sequential in real-world web application scenarios. For example, user behavior streams, such as user search logs in search systems, are naturally a temporal sequence. Modeling user sequential behaviors as explicit sequential representations can empower the multi-task model to incorporate temporal dependencies, thus predicting future user behavior more accurately. In this work, we study the challenging problem of how to model sequential user behavior in the neural multi-task learning settings. Our major contribution is a novel framework, Mixture of Sequential Experts (MoSE). It explicitly models sequential user behavior using Long Short-Term Memory (LSTM) in the state-of-art Multi-gate Mixture-of-Expert multi-task modeling framework. In experiments, we show the effectiveness of the MoSE architecture over seven alternative architectures on both synthetic and noisy real-world user data in Google Apps. We also demonstrate the effectiveness and flexibility of the MoSE architecture in a real-world decision making engine in GMail, by trading off between search quality and resource costs. View details
    No Results Found