In recent years, various deep neural network (DNN) models led to stellar performance in various domains. However, ML practitioners and researchers have observed severe reproducibility issues on DNN models. That is, a set of DNN models trained on the same data with exactly the same architecture may lead to quite different predictions. A common remedy is to use the ensemble method to quantify the prediction variations and improve model reproducibility. However, the ensemble method makes multiple predictions given an input, and is computationally expensive especially serving web-scale traffic at inference time.
In this paper, we seek to advance our understanding of prediction variation. We demonstrate that we are able to use neuron activation strength to infer prediction variation. Through empirical experiments on two widely used benchmark datasets Movielens and Criteo, we observed that prediction variations do come from various different sources with randomness, including training data shuffling, and model and embedding parameter random initialization. By adding more randomness sources into model training, we noticed that the ensemble method tends to produce more accurate predictions with higher prediction variations. Last but not least, we demonstrate that neuron activation strength has strong prediction power to infer the ensemble prediction variation. Our approach provides a cheap and simple way for prediction variation estimation, which sets up the foundation and opens up new opportunities for future work on many interesting areas (e.g., model-based reinforcement learning, and active learning) without having to relying on expensive ensemble models.