Machine learning and deep learning guide

Databricks is an environment that makes it easy to build, train, manage, and deploy machine learning and deep learning models at scale. Databricks integrates tightly with popular open-source libraries and with the MLflow machine learning platform API to support the end-to-end machine learning lifecycle from data preparation to deployment.

The Databricks Runtime for Machine Learning (Databricks Runtime ML) is a ready-to-go environment optimized for machine learning and data science. Databricks Runtime ML is built on top of and updated with every Databricks Runtime release.

Databricks Runtime ML includes many external libraries, including TensorFlow, PyTorch, Horovod, scikit-learn and XGBoost, and provides extensions to improve performance, including GPU acceleration in XGBoost, distributed deep learning using HorovodRunner, and model checkpointing using a Databricks File System (DBFS) FUSE mount.

To use Databricks Runtime ML, select the ML version of the runtime when you create your cluster.

  • You can install additional libraries or use init scripts to install libraries on clusters upon creation.
  • You can create GPU-enabled clusters to accelerate deep learning tasks. For information about creating GPU-enabled Databricks clusters, see GPU-enabled clusters. Databricks Runtime ML includes GPU hardware drivers and NVIDIA libraries such as CUDA.

This section includes information and examples for machine learning and deep learning workflows, including data loading, feature engineering, model training, hyperparameter tuning, model inference, and model deployment and export. Many of the examples also illustrate the benefits of using MLflow to track, manage, and deploy machine learning workflows.