Get started: Build your first machine learning model on Databricks

This example notebook illustrates how to train a machine learning classification model on Databricks. Databricks Runtime for Machine Learning comes with many libraries pre-installed, including scikit-learn for training and pre-processing algorithms, MLflow to track the model development process, and Hyperopt with SparkTrials to scale hyperparameter tuning.

In this notebook, you create a classification model to predict whether a wine is considered "high-quality". The dataset[1] consists of 11 features of different wines (for example, alcohol content, acidity, and residual sugar) and a quality ranking between 1 to 10.

This tutorial covers:

Part 1: Train a classification model with MLflow tracking
Part 2: Hyperparameter tuning to improve model performance
Part 3: Save results and models to Unity Catalog

For more details on productionizing machine learning on Databricks including model lifecycle management and model inference, see the ML End to End Example (AWS | Azure | GCP).

[1] The example uses a dataset from the UCI Machine Learning Repository, presented in Modeling wine preferences by data mining from physicochemical properties [Cortez et al., 2009].

Requirements

Cluster running Databricks Runtime 13.3 LTS ML or above

4

6

8

10

11

13:

Enable MLflow autologging

15:

Train the model

18

21

23

25:

Write results back to Unity Catalog

26:

Save model to Unity Catalog

get-started-machine-learning(Python)

Get started: Build your first machine learning model on Databricks

Requirements

Setup

Configure MLflow client

Read in data and save it to tables in Unity Catalog

Preprocess data

Part 1. Train a classification model

View MLflow runs

Load models

Part 2. Hyperparameter tuning

Parallel training with Hyperopt and SparkTrials

Search runs to retrieve the best model

Part 3. Save results and models to Unity Catalog