We present an efficient and scalable technique for spatio-temporal segmentation of long video sequences using a hierarchical graph-based algorithm. We begin by over-segmenting a volumetric video graph into space-time regions grouped by appearance. We then construct a ``region graph" over the obtained segmentation and iteratively repeat this process over multiple levels to create a tree of spatio-temporal segmentations. This hierarchical approach generates high quality segmentations which are temporally coherent with stable region boundaries. Additionally, the resulting segmentation hierarchy allows subsequent applications to choose from varying levels of granularity. We further improve segmentation quality by using dense optical flow when constructing the initial graph. We also propose two novel approaches to improve the scalability of our technique: (a) a parallel out-of-core algorithm that can process volumes much larger than an in-core algorithm, and (b) a clip-based processing algorithm that divides the video into overlapping clips in time, and segments them successively while enforcing consistency. We can segment video shots as long as 40 seconds without compromising quality, and even support a streaming mode for arbitrarily long videos, albeit without the ability to process them hierarchically.