Google hostload prediction based on Bayesian model with optimized feature combination

Sheng Dia
Derrick Kondo
Journal Parallel and Distributed Computing (2014)

Abstract

We design a novel prediction method with Bayes model to predict a load fluctuation pattern over a
long-term interval, in the context of Google data centers. We exploit a set of features that capture the
expectation, trend, stability and patterns of recent host loads. We also investigate the correlations among
these features and explore the most effective combinations of features with various training periods. All of
the prediction methods are evaluated using Google trace with 10,000+heterogeneous hosts. Experiments
show that our Bayes method improves the long-term load prediction accuracy by 5.6%–50%, compared
to other state-of-the-art methods based on moving average, auto-regression, and/or noise filters. Mean
squared error of pattern prediction with Bayes method can be approximately limited in [10−8
,10−5
].
Through a load balancing scenario, we confirm the precision of pattern prediction in finding a set of
idlest/busiest hosts from among 10,000+ hosts can be improved by about 7% on average.

Research Areas