Konstantinos Katsiapis

Konstantinos Katsiapis

Konstantinos (Gus) is a senior accountable lead for the Waymo Foundation Model, and focuses on data, infrastructure and evaluations. He also founded and runs Waymo’s engineering education in areas of software engineering, large scale data engineering and machine learning (ML) engineering.

Before joining Waymo he spent more than a decade working in applied ML at Google. He was introduced to ML infrastructure as an avid user of it while leading Mobile Display Ads Quality. He then transitioned to helping build Sibyl, and later serving as the über tech lead of its successor, TensorFlow Extended (TFX). These were Google’s most widely used end-to-end ML platforms at the time.

Prior to Google, Gus gathered knowledge and experience at Amazon, Calian, Ontario Ministry of Finance, Independent Electricity System Operator, and Computron.

Gus earned a master's degree in computer science with a specialization in artificial intelligence from Stanford University. He also earned a bachelor's degree in mathematics, majoring in computer science and minoring in economics, from the University of Waterloo.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Towards ML Engineering: A Brief History Of TensorFlow Extended (TFX)
    Abhijit Karmarkar
    Ahmet Altay
    Aleksandr Zaks
    Anusha Ramesh
    Jarek Wilkiewicz
    Jiri Simsa
    Justin Hong
    Mitch Trott
    Neoklis Polyzotis
    Noé Lutz
    Robert Crowe
    Sarah Sirajuddin
    Zhitao Li
    (2020)
    Preview abstract Software Engineering, as a discipline, has matured over the past 5+ decades. The modern world heavily depends on it, so the increased maturity of Software Engineering is a necessary blessing. Practices like testing and reliable technologies help make Software Engineering reliable enough to build industries upon. Meanwhile, Machine Learning (ML) has also grown over the past 2+ decades. ML is used more and more for research, experimentation and production workloads. ML now commonly powers widely-used products integral to our lives. But ML Engineering, as a discipline, has not widely matured as much as its Software Engineering ancestor. Can we take what we have learned and help the nascent field of applied ML evolve into ML Engineering the way Programming evolved into Software Engineering [book]? In this article we will give a whirlwind tour of Sibyl [article] and TensorFlow Extended (TFX) [website], two successive end-to-end (E2E) ML platforms at Alphabet. We will share the lessons learned from over a decade of applied ML built on these platforms, explain both their similarities and their differences, and expand on the shifts (both mental and technical) that helped us on our journey. In addition, we will highlight some of the capabilities of TFX that help realize several aspects of ML Engineering. We argue that in order to unlock the gains ML can bring, organizations should advance the maturity of their ML teams by investing in robust ML infrastructure and promoting ML Engineering education. We also recommend that before focusing on cutting-edge ML modeling techniques, product leaders should invest more time in adopting interoperable ML platforms for their organizations. In closing, we will also share a glimpse into the future of TFX. View details
    Continuous Training for Production ML in the TensorFlow Extended (TFX) Platform
    Denis M. Baylor
    Kevin Haas
    Sammy W Leong
    Rose Liu
    Clemens Mewald
    Neoklis Polyzotis
    Mitch Trott
    Marty Zinkevich
    In proceedings of USENIX OpML 2019
    Preview abstract Large organizations rely increasingly on continuous ML pipelines in order to keep machine-learned models continuously up-to-date with respect to data. In this scenario, disruptions in the pipeline can increase model staleness and thus degrade the quality of downstream services supported by these models. In this paper we describe the operation of continuous pipelines in the Tensorflow Extended (TFX) platform that we developed and deployed at Google. We present the main mechanisms in TFX to support this type of pipelines in production and the lessons learned from the deployment of the platform internally at Google. View details