Multi-task deep learning has been an actively researched topic, and it has been used in many real-world systems for user activities and content recommendation.
While most of the multi-task model architectures proposed to date focus on using non-sequential input features (e,g. query and context), input data is often sequential in real-world web application scenarios. For example, user behavior streams, such as user search logs in search systems, are naturally a temporal sequence. Modeling user sequential behaviors as explicit sequential representations can empower the multi-task model to incorporate temporal dependencies, thus predicting future user behavior more accurately.
In this work, we study the challenging problem of how to model sequential user behavior in the neural multi-task learning settings. Our major contribution is a novel framework, Mixture of Sequential Experts (MoSE). It explicitly models sequential user behavior using Long Short-Term Memory (LSTM) in the state-of-art Multi-gate Mixture-of-Expert multi-task modeling framework. In experiments, we show the effectiveness of the MoSE architecture over seven alternative architectures on both synthetic and noisy real-world user data in Google Apps. We also demonstrate the effectiveness and flexibility of the MoSE architecture in a real-world decision making engine in GMail, by trading off between search quality and resource costs.