Using SSDs to scale up Google Fusion Tables, a Database-in-the-Cloud
Abstract
Flash memory solid state drives (SSDs) have increasingly been advocated and adopted as a means of speeding up and scaling up data-driven applications. SSDs are becoming more widely available as an option in the cloud. However, when an application considers SSDs in the cloud, the best option for the application may not be immediate, among a number of choices for placing SSDs in the layers of the cloud. Although there have been many studies on SSDs, they often concern a specific setting, and how different SSD options in the cloud compare with each other is less well understood. In this paper, we describe how Google Fusion Tables (GFT) used SSDs and what optimizations were implemented to scale up its in-memory processing, clearly showing opportunities and limitations of SSDs in the cloud with quantitative analyses. We first discuss various SSD placement strategies and compare them with low-level measurements, and propose SSD-placement guidelines for a variety of cloud data services. We then present internals of our column engine and optimizations to better use the performance characteristics of SSDs. We empirically demonstrate that the optimizations enable us to scale our application to much larger datasets while retaining the low-latency and simple query processing architecture.