Prepare Storage for Data Loading and Model Checkpointing

Data loading and model checkpointing are crucial to deep learning (especially distributed DL) workloads. Databricks Runtime offers different ways to support high performance data I/O:

  • Databricks Runtime 6.0 and above: Databricks provides a high performance FUSE mount.
  • Databricks Runtime 5.5 LTS: Databricks provides dbfs:/ml, a special folder that offers high-performance I/O for deep learning workloads, that maps to file:/dbfs/ml on driver and worker nodes. Databricks recommends using Databricks Runtime 5.4 or above and saving data under /dbfs/ml. This FUSE mount also alleviates the local file I/O API limitation in Databricks Runtime of supporting only files smaller than 2GB.