Projecting Disk Usage Based on Historical Trends in a Cloud Environment

Murray Stokely

Amaan Mehrabian

Christoph Albrecht

Francois Labelle

Arif Merchant

ScienceCloud 2012 Proceedings of the 3rd International Workshop on Scientific Cloud Computing, ACM, pp. 63-70

Google Scholar

Abstract

Provisioning scarce resources among competing users and jobs remains one of the primary challenges of operating large-scale, distributed computing environments. Distributed storage systems, in particular, typically rely on hard operator-set quotas to control disk allocation and enforce isolation for space and I/O bandwidth among disparate users. However, users and operators are very poor at predicting future requirements and, as a result, tend to over-provision grossly. For three years, we collected detailed usage information for data stored in distributed filesystems in a large private cloud spanning dozens of clusters on multiple continents. Specifically, we measured the disk space usage, I/O rate, and age of stored data for thousands of different engineering users and teams. We find that although the individual timeseries often have non-stable usage trends, regional aggregations, user classification, and ensemble forecasting methods can be combined to provide a more accurate prediction of future use for the majority of users. We applied this methodology for the storage users in one geographic region and back-tested these techniques over the past three years to compare our forecasts against actual usage. We find that by classifying a small subset of users with unforecastable trend changes due to known product launches, we can generate three-month out forecasts with mean absolute errors of less than ~12%. This compares favorably to the amount of allocated but unused quota that is generally wasted with manual operator-set quotas.

Research Areas

Distributed Systems and Parallel Computing

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Projecting Disk Usage Based on Historical Trends in a Cloud Environment

Abstract

Research Areas

Meet the teams driving innovation

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Projecting Disk Usage Based on Historical Trends in a Cloud Environment

Abstract

Research Areas

Meet the teams driving innovation

AI/ML Foundations  & Capabilities