Launching Distributed Model Training

The notebook below is the fourth of six notebooks demonstrating how to perform distributed training with TensorFlowOnSpark on the MNIST dataset; click here to download the full set of notebooks, or see the TensorFlowOnSpark guide for more info. The notebook uses the helpers defined in the previous notebooks to download data from S3 and build the model graph, then calls TensorFlowOnSpark APIs to launch distributed model training on the Spark workers.

Next Steps

  • See Model Evaluation to learn how to run model evaluation concurrently with model training.
  • See TensorBoard to learn how to visualize model training & validation performance metrics (e.g. loss, accuracy) in real time with TensorBoard.