Single node Keras to distributed deep learning

Databricks recommends tf.keras for Keras, which is TensorFlow’s implementation of the Keras API specification.


The keras package uses HDF5 format to save model checkpoints. This format is incompatible with the optimized FUSE mount at file:/dbfs/ml and Goofys because they both support only sequential write. If use the keras package, you can save model checkpoints to a local directory and then copy them to persistent storage.

The tf.keras package uses TensorFlow checkpoint format, which doesn’t have this issue.

The notebook below follows our recommended development workflow.

HorovodRunner TensorFlow and Keras MNIST example notebook