Get started with Databricks as a machine learning engineer

The quickstarts and tutorials listed here are designed to get you started quickly with machine learning on Databricks. Each includes a notebook that you can import and run in your own Databricks workspace. They illustrate how to use Databricks throughout the machine learning lifecycle, including data loading and preparation; model training, tuning, and inference; and model deployment and management. They demonstrate helpful tools such as Hyperopt for automated hyperparameter tuning, MLflow tracking and autologging for model development, and Model Registry for model management.

Note

To run a notebook included in any of these tutorials, click Copy link for import above the notebook on the tutorial page. In your Databricks workspace browser, select Import from any folder menu and paste the URL. To run a notebook, you must have a cluster to run it on. For more information about creating clusters and running notebooks, see Get started with Databricks as a data scientist.

For users new to Databricks

The best place to start as a user new to Databricks Machine Learning is to:

  1. Follow the Get started with Databricks as a data scientist quickstart.

  2. Run the in-product quickstart notebook included in the Databricks Machine Learning environment.

    This notebook illustrates many of the benefits of using Databricks for machine learning, including tracking model development with MLflow and parallelizing hyperparameter tuning runs. The notebook walks you through how to load data, train and tune a model, compare and analyze model performance, and use the model for inference.

To run the in-product quickstart notebook:

  1. Log in to your Databricks workspace and go to the Databricks Machine Learning persona-based environment.

    To change the persona, click the icon below the Databricks logo Databricks logo, and select Machine Learning.

    change persona
  2. On the Databricks Machine Learning start page, click Start guide at the upper right.

    Machine learning tutorial notebook

scikit-learn tutorials

Notebook

Requirements

Features

Machine learning quickstart

Databricks Runtime 7.5 ML or above

Classification model, MLflow, automated hyperparameter tuning with Hyperopt and MLflow

Machine learning with Model Registry

Databricks Runtime 7.0 ML or above

Classification model, MLflow, automated hyperparameter tuning with Hyperopt and MLflow, Model Registry

End-to-end example

Databricks Runtime 6.5 ML or above

Classification model, MLflow, automated hyperparameter tuning with Hyperopt and MLflow, XGBoost, Model Registry, Model Serving

Apache Spark MLlib tutorial

Notebook

Requirements

Features

Machine learning with MLlib

Databricks Runtime 5.5 LTS ML or above

Logistic regression model, Spark pipeline, automated hyperparameter tuning using MLlib API

Deep learning tutorial

Notebook

Requirements

Features

Deep learning with TensorFlow Keras

Databricks Runtime 7.0 ML or above

Neural network model, inline TensorBoard, automated hyperparameter tuning with Hyperopt and MLflow, autologging, ModelRegistry