Hui Miao
Hui Miao has been at Google since Aug 2018. He contributes to the TensorFlow Extended (TFX) platform with research interests at the intersection of data management and machine learning. His work focuses on data and provenance system for continuous machine learning pipelines. Prior to joining Google, he completed his Ph.D. at the University of Maryland, College Park, advised by Prof. Amol Deshpande. His dissertation research focused on lifecycle management systems for collaborative data science workflows.
Research Areas
Authored Publications
Sort By
Preview abstract
Machine learning (ML) is now commonplace, powering data-driven applications in a host of industries and organizations.Unlike the traditional perception of ML in research, ML pro-duction pipelines are complex, with many interlocking an-alytical components beyond training, whose sub-parts areoften run multiple times, over overlapping subsets of data.However, there is a lack of quantitative evidence regard-ing the lifespan, architecture, frequency, and complexityof these pipelines to understand how data management re-search can be used to make them more efficient, effective,robust, and reproducible. To that end, we analyze the prove-nance graphs of over 10K production ML pipelines at Googlespanning a period of over four months, in an effort to under-stand the complexity and challenges underlying productionML. Our analysis reveals the characteristics, components,and topologies of typical industry-strength ML pipelines atvarious granularities. Along the way, we introduce a newspecialized data model for representing and reasoning aboutrepeatedly run components (or sub-pipelines) of these MLpipelines, that we call model graphlets. We identify severalrich opportunities for optimization, leveraging traditionaldata management ideas. We show how targeting even oneof these opportunities, i.e., that of identifying and prun-ing wasted computation that doesn’t translate to deploy-ment, can reduce overall computation costs by between 30-50% without compromising the overall freshness of deployedmodels.
View details
Continuous Training for Production ML in the TensorFlow Extended (TFX) Platform
Denis M. Baylor
Kevin Haas
Sammy W Leong
Rose Liu
Clemens Mewald
Neoklis Polyzotis
Mitch Trott
Marty Zinkevich
In proceedings of USENIX OpML 2019
Preview abstract
Large organizations rely increasingly on continuous ML
pipelines in order to keep machine-learned models continuously up-to-date with respect to data. In this scenario, disruptions in the pipeline can increase model staleness and
thus degrade the quality of downstream services supported by
these models. In this paper we describe the operation of continuous pipelines in the Tensorflow Extended (TFX) platform
that we developed and deployed at Google. We present the
main mechanisms in TFX to support this type of pipelines in
production and the lessons learned from the deployment of
the platform internally at Google.
View details