Jarek Wilkiewicz
Jarek is the Director of Technical Program Management at Google DeepMind. Jarek joined Google in 2010, has a MS in Software Management from Carnegie Mellon University and BS Computer Science with concentration on AI from The University of Memphis. Prior to joining Google, he worked on building telecommunications systems software at Hewlett Packard, BEA Systems, Alcatel and a couple of startups.
Research Areas
Authored Publications
Sort By
Towards ML Engineering: A Brief History Of TensorFlow Extended (TFX)
Abhijit Karmarkar
Ahmet Altay
Aleksandr Zaks
Anusha Ramesh
Jiri Simsa
Justin Hong
Mitch Trott
Neoklis Polyzotis
Noé Lutz
Robert Crowe
Sarah Sirajuddin
Zhitao Li
(2020)
Preview abstract
Software Engineering, as a discipline, has matured over the past 5+ decades. The modern world heavily depends on it, so the increased maturity of Software Engineering is a necessary blessing. Practices like testing and reliable technologies help make Software Engineering reliable enough to build industries upon. Meanwhile, Machine Learning (ML) has also grown over the past 2+ decades. ML is used more and more for research, experimentation and production workloads. ML now commonly powers widely-used products integral to our lives.
But ML Engineering, as a discipline, has not widely matured as much as its Software Engineering ancestor. Can we take what we have learned and help the nascent field of applied ML evolve into ML Engineering the way Programming evolved into Software Engineering [book]?
In this article we will give a whirlwind tour of Sibyl [article] and TensorFlow Extended (TFX) [website], two successive end-to-end (E2E) ML platforms at Alphabet. We will share the lessons learned from over a decade of applied ML built on these platforms, explain both their similarities and their differences, and expand on the shifts (both mental and technical) that helped us on our journey. In addition, we will highlight some of the capabilities of TFX that help realize several aspects of ML Engineering. We argue that in order to unlock the gains ML can bring, organizations should advance the maturity of their ML teams by investing in robust ML infrastructure and promoting ML Engineering education. We also recommend that before focusing on cutting-edge ML modeling techniques, product leaders should invest more time in adopting interoperable ML platforms for their organizations. In closing, we will also share a glimpse into the future of TFX.
View details
TFX: A TensorFlow-Based Production-Scale Machine Learning Platform
Akshay Naresh Modi
Chiu Yuen Koo
Chuan Yu Foo
Clemens Mewald
Denis M. Baylor
Levent Koc
Lukasz Lew
Martin A. Zinkevich
Mustafa Ispir
Neoklis Polyzotis
Steven Whang
Sudip Roy
Sukriti Ramesh
Vihan Jain
Xin Zhang
KDD 2017
Preview abstract
Creating and maintaining a platform for reliably producing and deploying machine learning models requires careful orchestration of many components—a learner for generating models based on training data, modules for analyzing and validating both data as well as models, and finally infrastructure for serving models in production. This becomes particularly challenging when data changes over time and fresh models need to be produced continuously. Unfortunately, such orchestration is often done ad hoc using glue code and custom scripts developed by individual teams for specific use cases, leading to duplicated effort and fragile systems with high technical debt.
We present TensorFlow Extended (TFX), a TensorFlow-based general-purpose machine learning platform implemented at Google. By integrating the aforementioned components into one platform, we were able to standardize the components, simplify the platform configuration, and reduce the time to production from the order of months to weeks, while providing platform stability that minimizes disruptions.
We present the case study of one deployment of TFX in the Google Play app store, where the machine learning models are refreshed continuously as new data arrive. Deploying TFX led to reduced custom code, faster experiment cycles, and a 2% increase in app installs resulting from improved data and model analysis.
View details