Google Cluster Data

January 7, 2010

Posted by Joseph L. Hellerstein, Manager of Google Performance Analytics

Google faces a large number of technical challenges in the evolution of its applications and infrastructure. In particular, as we increase the size of our compute clusters and scale the work that they process, many issues arise in how to schedule the diversity of work that runs on Google systems.

We have distilled these challenges into the following research topics that we feel are interesting to the academic community and important to Google:
  • Workload characterizations: How can we characterize Google workloads in a way that readily generates synthetic work that is representative of production workloads so that we can run stand alone benchmarks?
  • Predictive models of workload characteristics: What is normal and what is abnormal workload? Are there "signals" that can indicate problems in a time-frame that is possible for automated and/or manual responses?
  • New algorithms for machine assignment: How can we assign tasks to machines so that we make best use of machine resources, avoid excess resource contention on machines, and manage power efficiently?
  • Scalable management of cell work: How should we design the future cell management system to efficiently visualize work in cells, to aid in problem determination, and to provide automation of management tasks?
To aid researchers in addressing these questions in a realistic manner, we will provide data from Google production systems. The initial focus of these data will be workload characterization. Details of the data can be found here. The data are structured as follows:
  • Time (int) - time in seconds since the start of data collection
  • JobID (int) - Unique identifier of the job to which this task belongs
  • TaskID (int) - Unique identifier of the executing task
  • Job Type (0, 1, 2, 3) - class of job (a categorization of work)
  • Normalized Task Cores (float) - normalized value of the average number of cores used by the task
  • Normalized Task Memory (float) - normalized value of the average memory consumed by the task
We solicit your feedback in terms of: (a) the quality and content of the data we are providing; (b) technical approaches and/or results related to the topics above; and (c) other research topics that you feel Google should be addressing in the area of Cloud Computing (along with details of the data required to address these topics).