TensorFlow Debugger: Debugging Dataflow Graphs for Machine Learning
Abstract
Debuggability is important in the development of machine-learning (ML) systems.
Several widely-used ML libraries, such as TensorFlow and Theano, are based on
dataflow graphs. While offering important benefits such as facilitating distributed
training, the dataflow graph paradigm makes the debugging of model issues more
challenging compared to debugging in the more conventional procedural paradigm.
In this paper, we present the design of the TensorFlow Debugger (tfdbg), a specialized
debugger for ML models written in TensorFlow. tfdbg provides features
to inspect runtime dataflow graphs and the state of the intermediate graph elements
("tensors"), as well as simulating stepping on the graph. We will discuss the
application of this debugger in development and testing use cases.
Several widely-used ML libraries, such as TensorFlow and Theano, are based on
dataflow graphs. While offering important benefits such as facilitating distributed
training, the dataflow graph paradigm makes the debugging of model issues more
challenging compared to debugging in the more conventional procedural paradigm.
In this paper, we present the design of the TensorFlow Debugger (tfdbg), a specialized
debugger for ML models written in TensorFlow. tfdbg provides features
to inspect runtime dataflow graphs and the state of the intermediate graph elements
("tensors"), as well as simulating stepping on the graph. We will discuss the
application of this debugger in development and testing use cases.