Machine Learning Design Patterns
Abstract
In engineering disciplines, best practices and solutions to commonly occurring problems are captured in the form of design patterns. Design patterns codify the experience of hundreds of experts into advice that all practitioners can follow. As ML becomes more mainstream, it is important that practitioners take advantage of tried-and-proven methods to address recurring problems. However, there is no collection of proven design patterns in machine learning. This book remedies that.
This book is a catalog of design patterns or repeatable solutions to commonly occurring problems in ML engineering. For example, the Transform pattern enforces the separation of inputs, features, and transforms and making the transformations persistent in order to simplify moving an ML model to production. Similarly, Keyed Predictions is a pattern that enables the large scale distribution of batch predictions, such as for recommendation models.
For each pattern, we describe the commonly occurring problem that is being addressed and then walk through a variety of potential solutions to the problem, the tradeoffs of these solutions, and then recommend how to choose between these solutions.
Implementation code for these solutions is provided in SQL (useful if you are carrying out preprocessing and other ETL in Spark SQL, BigQuery, etc.) and Keras.
This book is a catalog of design patterns or repeatable solutions to commonly occurring problems in ML engineering. For example, the Transform pattern enforces the separation of inputs, features, and transforms and making the transformations persistent in order to simplify moving an ML model to production. Similarly, Keyed Predictions is a pattern that enables the large scale distribution of batch predictions, such as for recommendation models.
For each pattern, we describe the commonly occurring problem that is being addressed and then walk through a variety of potential solutions to the problem, the tradeoffs of these solutions, and then recommend how to choose between these solutions.
Implementation code for these solutions is provided in SQL (useful if you are carrying out preprocessing and other ETL in Spark SQL, BigQuery, etc.) and Keras.