The notebook below is the first of six notebooks demonstrating how to perform distributed training with TensorFlowOnSpark on the MNIST dataset. Download the full set of notebooks or see the TensorFlowOnSpark guide for more information.
This notebook demonstrates how to download data from S3 and create a data ingest pipeline
(load training data from disk into in-memory tensors) using
tf.data APIs in TensorFlow. In our
example, training data is loaded from TFRecords files. If instead you would like to work with data processed
in Spark, consider using the Spark-TensorFlow connector
to persist your DataFrames as TFRecords files, then load them into TensorFlow using a workflow similar
to that described below.
The next stage of the TensorFlowOnSpark training pipeline is to construct a TensorFlow graph for distributed model training. For more information, see Constructing the Model Graph.