machine-learning-with-unity-catalog(Python)

Loading...

Databricks ML Tutorial: Model Training

This notebook provides a quick overview of machine learning model training on Databricks. To train models, you can use libraries like scikit-learn that are preinstalled on the Databricks Runtime for Machine Learning. In addition, you can use MLflow to track the trained models, and Hyperopt with SparkTrials to scale hyperparameter tuning.

This tutorial covers:

  • Part 1: Training a simple classification model with MLflow tracking
  • Part 2: Hyperparameter tuning a better performing model with Hyperopt
  • Part 3: Save results and models to Unity Catalog

For more details on productionizing machine learning on Databricks including model lifecycle management and model inference, see the ML End to End Example (AWS | Azure | GCP).

The example uses a dataset from the UCI Machine Learning Repository, presented in Modeling wine preferences by data mining from physicochemical properties [Cortez et al., 2009].

Requirements

  • Cluster running Databricks Runtime 15.4 LTS ML or above
2

Requirement already satisfied: mlflow-skinny[databricks] in /databricks/python3/lib/python3.11/site-packages (2.15.1) Collecting mlflow-skinny[databricks] Obtaining dependency information for mlflow-skinny[databricks] from https://files.pythonhosted.org/packages/a3/35/2821869a7c78e50148e460406834c4d8aa863d361b7084a8e923f18be474/mlflow_skinny-2.17.0-py3-none-any.whl.metadata Downloading mlflow_skinny-2.17.0-py3-none-any.whl.metadata (30 kB) Requirement already satisfied: cachetools<6,>=5.0.0 in /databricks/python3/lib/python3.11/site-packages (from mlflow-skinny[databricks]) (5.4.0) Requirement already satisfied: click<9,>=7.0 in /databricks/python3/lib/python3.11/site-packages (from mlflow-skinny[databricks]) (8.0.4) Requirement already satisfied: cloudpickle<4 in /databricks/python3/lib/python3.11/site-packages (from mlflow-skinny[databricks]) (2.2.1) Requirement already satisfied: databricks-sdk<1,>=0.20.0 in /databricks/python3/lib/python3.11/site-packages (from mlflow-skinny[databricks]) (0.20.0) Requirement already satisfied: gitpython<4,>=3.1.9 in /databricks/python3/lib/python3.11/site-packages (from mlflow-skinny[databricks]) (3.1.27) Requirement already satisfied: importlib-metadata!=4.7.0,<9,>=3.7.0 in /databricks/python3/lib/python3.11/site-packages (from mlflow-skinny[databricks]) (6.0.0) Requirement already satisfied: opentelemetry-api<3,>=1.9.0 in /databricks/python3/lib/python3.11/site-packages (from mlflow-skinny[databricks]) (1.25.0) Requirement already satisfied: opentelemetry-sdk<3,>=1.9.0 in /databricks/python3/lib/python3.11/site-packages (from mlflow-skinny[databricks]) (1.25.0) Requirement already satisfied: packaging<25 in /databricks/python3/lib/python3.11/site-packages (from mlflow-skinny[databricks]) (23.2) Requirement already satisfied: protobuf<6,>=3.12.0 in /databricks/python3/lib/python3.11/site-packages (from mlflow-skinny[databricks]) (4.24.1) Requirement already satisfied: pyyaml<7,>=5.1 in /databricks/python3/lib/python3.11/site-packages (from mlflow-skinny[databricks]) (6.0) Requirement already satisfied: requests<3,>=2.17.3 in /databricks/python3/lib/python3.11/site-packages (from mlflow-skinny[databricks]) (2.31.0) Requirement already satisfied: sqlparse<1,>=0.4.0 in /databricks/python3/lib/python3.11/site-packages (from mlflow-skinny[databricks]) (0.4.2) Requirement already satisfied: azure-storage-file-datalake>12 in /databricks/python3/lib/python3.11/site-packages (from mlflow-skinny[databricks]) (12.14.0) Requirement already satisfied: google-cloud-storage>=1.30.0 in /databricks/python3/lib/python3.11/site-packages (from mlflow-skinny[databricks]) (2.10.0) Requirement already satisfied: boto3>1 in /databricks/python3/lib/python3.11/site-packages (from mlflow-skinny[databricks]) (1.34.39) Requirement already satisfied: botocore in /databricks/python3/lib/python3.11/site-packages (from mlflow-skinny[databricks]) (1.34.39) Requirement already satisfied: azure-core<2.0.0,>=1.28.0 in /databricks/python3/lib/python3.11/site-packages (from azure-storage-file-datalake>12->mlflow-skinny[databricks]) (1.30.2) Requirement already satisfied: azure-storage-blob<13.0.0,>=12.19.0 in /databricks/python3/lib/python3.11/site-packages (from azure-storage-file-datalake>12->mlflow-skinny[databricks]) (12.19.1) Requirement already satisfied: typing-extensions>=4.3.0 in /databricks/python3/lib/python3.11/site-packages (from azure-storage-file-datalake>12->mlflow-skinny[databricks]) (4.10.0) Requirement already satisfied: isodate>=0.6.1 in /databricks/python3/lib/python3.11/site-packages (from azure-storage-file-datalake>12->mlflow-skinny[databricks]) (0.6.1) Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /databricks/python3/lib/python3.11/site-packages (from boto3>1->mlflow-skinny[databricks]) (0.10.0) Requirement already satisfied: s3transfer<0.11.0,>=0.10.0 in /databricks/python3/lib/python3.11/site-packages (from boto3>1->mlflow-skinny[databricks]) (0.10.2) Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /databricks/python3/lib/python3.11/site-packages (from botocore->mlflow-skinny[databricks]) (2.8.2) Requirement already satisfied: urllib3<2.1,>=1.25.4 in /databricks/python3/lib/python3.11/site-packages (from botocore->mlflow-skinny[databricks]) (1.26.16) Requirement already satisfied: google-auth~=2.0 in /databricks/python3/lib/python3.11/site-packages (from databricks-sdk<1,>=0.20.0->mlflow-skinny[databricks]) (2.21.0) Requirement already satisfied: gitdb<5,>=4.0.1 in /databricks/python3/lib/python3.11/site-packages (from gitpython<4,>=3.1.9->mlflow-skinny[databricks]) (4.0.11) Requirement already satisfied: google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0dev,>=1.31.5 in /databricks/python3/lib/python3.11/site-packages (from google-cloud-storage>=1.30.0->mlflow-skinny[databricks]) (2.18.0) Requirement already satisfied: google-cloud-core<3.0dev,>=2.3.0 in /databricks/python3/lib/python3.11/site-packages (from google-cloud-storage>=1.30.0->mlflow-skinny[databricks]) (2.4.1) Requirement already satisfied: google-resumable-media>=2.3.2 in /databricks/python3/lib/python3.11/site-packages (from google-cloud-storage>=1.30.0->mlflow-skinny[databricks]) (2.7.1) Requirement already satisfied: zipp>=0.5 in /databricks/python3/lib/python3.11/site-packages (from importlib-metadata!=4.7.0,<9,>=3.7.0->mlflow-skinny[databricks]) (3.11.0) Requirement already satisfied: deprecated>=1.2.6 in /databricks/python3/lib/python3.11/site-packages (from opentelemetry-api<3,>=1.9.0->mlflow-skinny[databricks]) (1.2.14) Requirement already satisfied: opentelemetry-semantic-conventions==0.46b0 in /databricks/python3/lib/python3.11/site-packages (from opentelemetry-sdk<3,>=1.9.0->mlflow-skinny[databricks]) (0.46b0) Requirement already satisfied: charset-normalizer<4,>=2 in /databricks/python3/lib/python3.11/site-packages (from requests<3,>=2.17.3->mlflow-skinny[databricks]) (2.0.4) Requirement already satisfied: idna<4,>=2.5 in /databricks/python3/lib/python3.11/site-packages (from requests<3,>=2.17.3->mlflow-skinny[databricks]) (3.4) Requirement already satisfied: certifi>=2017.4.17 in /databricks/python3/lib/python3.11/site-packages (from requests<3,>=2.17.3->mlflow-skinny[databricks]) (2023.7.22) Requirement already satisfied: six>=1.11.0 in /usr/lib/python3/dist-packages (from azure-core<2.0.0,>=1.28.0->azure-storage-file-datalake>12->mlflow-skinny[databricks]) (1.16.0) Requirement already satisfied: cryptography>=2.1.4 in /databricks/python3/lib/python3.11/site-packages (from azure-storage-blob<13.0.0,>=12.19.0->azure-storage-file-datalake>12->mlflow-skinny[databricks]) (41.0.3) Requirement already satisfied: wrapt<2,>=1.10 in /databricks/python3/lib/python3.11/site-packages (from deprecated>=1.2.6->opentelemetry-api<3,>=1.9.0->mlflow-skinny[databricks]) (1.14.1) Requirement already satisfied: smmap<6,>=3.0.1 in /databricks/python3/lib/python3.11/site-packages (from gitdb<5,>=4.0.1->gitpython<4,>=3.1.9->mlflow-skinny[databricks]) (5.0.0) Requirement already satisfied: googleapis-common-protos<2.0.dev0,>=1.56.2 in /databricks/python3/lib/python3.11/site-packages (from google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0dev,>=1.31.5->google-cloud-storage>=1.30.0->mlflow-skinny[databricks]) (1.63.0) Requirement already satisfied: proto-plus<2.0.0dev,>=1.22.3 in /databricks/python3/lib/python3.11/site-packages (from google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0dev,>=1.31.5->google-cloud-storage>=1.30.0->mlflow-skinny[databricks]) (1.24.0) Requirement already satisfied: pyasn1-modules>=0.2.1 in /databricks/python3/lib/python3.11/site-packages (from google-auth~=2.0->databricks-sdk<1,>=0.20.0->mlflow-skinny[databricks]) (0.2.8) Requirement already satisfied: rsa<5,>=3.1.4 in /databricks/python3/lib/python3.11/site-packages (from google-auth~=2.0->databricks-sdk<1,>=0.20.0->mlflow-skinny[databricks]) (4.9) Requirement already satisfied: google-crc32c<2.0dev,>=1.0 in /databricks/python3/lib/python3.11/site-packages (from google-resumable-media>=2.3.2->google-cloud-storage>=1.30.0->mlflow-skinny[databricks]) (1.5.0) Requirement already satisfied: cffi>=1.12 in /databricks/python3/lib/python3.11/site-packages (from cryptography>=2.1.4->azure-storage-blob<13.0.0,>=12.19.0->azure-storage-file-datalake>12->mlflow-skinny[databricks]) (1.15.1) Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /databricks/python3/lib/python3.11/site-packages (from pyasn1-modules>=0.2.1->google-auth~=2.0->databricks-sdk<1,>=0.20.0->mlflow-skinny[databricks]) (0.4.8) Requirement already satisfied: pycparser in /databricks/python3/lib/python3.11/site-packages (from cffi>=1.12->cryptography>=2.1.4->azure-storage-blob<13.0.0,>=12.19.0->azure-storage-file-datalake>12->mlflow-skinny[databricks]) (2.21) Downloading mlflow_skinny-2.17.0-py3-none-any.whl (5.7 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/5.7 MB ? eta -:--:-- ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.3/5.7 MB 7.9 MB/s eta 0:00:01 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.7/5.7 MB 9.5 MB/s eta 0:00:01 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/5.7 MB 11.0 MB/s eta 0:00:01 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/5.7 MB 12.8 MB/s eta 0:00:01 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.5/5.7 MB 14.8 MB/s eta 0:00:01 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.4/5.7 MB 16.5 MB/s eta 0:00:01 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.7/5.7 MB 19.3 MB/s eta 0:00:01 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.7/5.7 MB 21.3 MB/s eta 0:00:01 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.7/5.7 MB 18.6 MB/s eta 0:00:00 Installing collected packages: mlflow-skinny Attempting uninstall: mlflow-skinny Found existing installation: mlflow-skinny 2.15.1 Not uninstalling mlflow-skinny at /databricks/python3/lib/python3.11/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-cfaad166-c4f1-42d7-89b8-2913776511ae Can't uninstall 'mlflow-skinny'. No files were found to uninstall. Successfully installed mlflow-skinny-2.17.0 Note: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.

Unity Catalog setup

By default, the MLflow Python client creates models in the Databricks workspace model registry. To save models in Unity Catalog, configure the MLflow client as shown in the following cell.

4

The following cell sets the catalog and schema where the model will be registered. You must have USE CATALOG privilege on the catalog, and USE_SCHEMA, CREATE_TABLE, and CREATE_MODEL privileges on the schema. Change the catalog and schema names in the following cell if necessary.

For more information about how to use Unity Catalog, see (AWS | Azure | GCP).

6

Write data to Unity Catalog tables

The dataset is available in databricks-datasets. In the following cell, you read the data in from .csv files into Spark DataFrames. You then write the DataFrames to tables in Unity Catalog. This both persists the data and lets you control how to share it with others.

8

9

Load data from Unity Catalog and do preprocessing

11

Part 1. Train a classification model

13

2024/10/22 19:03:51 INFO mlflow.tracking.fluent: Autologging successfully enabled for sklearn. 2024/10/22 19:03:51 WARNING mlflow.spark: With Pyspark >= 3.2, PYSPARK_PIN_THREAD environment variable must be set to false for Spark datasource autologging to work. 2024/10/22 19:03:51 INFO mlflow.tracking.fluent: Autologging successfully enabled for pyspark. 2024/10/22 19:03:51 INFO mlflow.tracking.fluent: Autologging successfully enabled for pyspark.ml.

Next, train a classifier within the context of an MLflow run, which automatically logs the trained model and many associated metrics and parameters.

You can supplement the logging with additional metrics such as the model's AUC score on the test dataset.

15

Test AUC of: 0.8834365701533531 2024/10/22 19:04:01 INFO mlflow.tracking._tracking_service.client: 🏃 View run gradient_boost at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130/runs/38b61d29418a4be4987123fcf4e1bc4b. 2024/10/22 19:04:01 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130.
16

Test AUC of: 0.8914761673151751 2024/10/22 19:04:15 INFO mlflow.tracking._tracking_service.client: 🏃 View run gradient_boost at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130/runs/33e6fab73b9943f5b34fc6b355cea369. 2024/10/22 19:04:15 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130.

View MLflow runs

To view the logged training run, click the Experiment icon at the upper right of the notebook to display the experiment sidebar. If necessary, click the refresh icon to fetch and monitor the latest runs.

To display the more detailed MLflow experiment page, click the experiment page icon. This page allows you to compare runs and view details for specific runs (AWS | Azure | GCP).

Load models

You can also access the results for a specific run using the MLflow API. The code in the following cell illustrates how to load the model trained in a given MLflow run and use it to make predictions. You can also find code snippets for loading specific models on the MLflow run page (AWS | Azure | GCP).

19

Part 2. Hyperparameter Tuning

At this point, you have trained a simple model and used the MLflow tracking service to organize your work. Next, you can perform more sophisticated tuning using Hyperopt.

Parallel training with Hyperopt and SparkTrials

Hyperopt is a Python library for hyperparameter tuning. For more information about using Hyperopt in Databricks, see the documentation (AWS | Azure | GCP).

You can use Hyperopt with SparkTrials to run hyperparameter sweeps and train multiple models in parallel. This reduces the time required to optimize model performance. MLflow tracking is integrated with Hyperopt to automatically log models and parameters.

22

Hyperopt with SparkTrials will automatically track trials in MLflow. To view the MLflow experiment associated with the notebook, click the 'Runs' icon in the notebook context bar on the upper right. There, you can view all runs. To view logs from trials, please check the Spark executor logs. To view executor logs, expand 'Spark Jobs' above until you see the (i) icon next to the stage from the trial job. Click it and find the list of tasks. Click the 'stderr' link for a task to view trial logs. 3%|▎ | 1/32 [00:24<12:31, 24.25s/trial, best loss: -0.9075482089723049]2024/10/22 19:04:44 INFO mlflow.tracking._tracking_service.client: 🏃 View run kindly-flea-807 at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130/runs/fdd45b5f933d4c03880fdd17ec58d569. 2024/10/22 19:04:44 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130. 6%|▋ | 2/32 [00:46<11:27, 22.93s/trial, best loss: -0.9075482089723049]2024/10/22 19:05:06 INFO mlflow.tracking._tracking_service.client: 🏃 View run angry-stoat-387 at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130/runs/cb36586e5c854058976cf1c7a6873ab0. 2024/10/22 19:05:06 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130. 2024/10/22 19:05:33 INFO mlflow.tracking._tracking_service.client: 🏃 View run tasteful-grouse-854 at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130/runs/850659142e5945e6a361560c4ec1993c. 2024/10/22 19:05:33 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130. 12%|█▎ | 4/32 [01:40<12:00, 25.74s/trial, best loss: -0.9102090009155415]2024/10/22 19:06:01 INFO mlflow.tracking._tracking_service.client: 🏃 View run blushing-fowl-139 at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130/runs/06fa5da779334dc0ad241ecca47cffb5. 2024/10/22 19:06:01 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130. 2024/10/22 19:06:35 INFO mlflow.tracking._tracking_service.client: 🏃 View run unequaled-hound-241 at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130/runs/7b6589127b4544108f8f20728b462d8f. 2024/10/22 19:06:35 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130. 19%|█▉ | 6/32 [02:36<11:25, 26.38s/trial, best loss: -0.9110673208972305]2024/10/22 19:06:57 INFO mlflow.tracking._tracking_service.client: 🏃 View run crawling-toad-760 at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130/runs/730549efa6a34500ae10ceb6c4b3f417. 2024/10/22 19:06:57 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130. 22%|██▏ | 7/32 [02:55<09:59, 23.97s/trial, best loss: -0.9110673208972305]2024/10/22 19:07:16 INFO mlflow.tracking._tracking_service.client: 🏃 View run judicious-pig-604 at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130/runs/73929750e3e54c509f19453a88c729b5. 2024/10/22 19:07:16 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130. 25%|██▌ | 8/32 [03:14<08:57, 22.39s/trial, best loss: -0.9110673208972305]2024/10/22 19:07:35 INFO mlflow.tracking._tracking_service.client: 🏃 View run tasteful-ray-218 at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130/runs/3892f2929900476e8080bd97a60bdf8c. 2024/10/22 19:07:35 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130. 28%|██▊ | 9/32 [03:32<08:03, 21.02s/trial, best loss: -0.9110673208972305]2024/10/22 19:07:53 INFO mlflow.tracking._tracking_service.client: 🏃 View run rumbling-hare-633 at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130/runs/10bf6425a069427c896e23ff6d5992b2. 2024/10/22 19:07:53 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130. 31%|███▏ | 10/32 [03:54<07:49, 21.33s/trial, best loss: -0.9110673208972305]2024/10/22 19:08:15 INFO mlflow.tracking._tracking_service.client: 🏃 View run abundant-auk-801 at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130/runs/42646c5c65f7455e83a21c5571a8cb90. 2024/10/22 19:08:15 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130. 34%|███▍ | 11/32 [04:14<07:19, 20.92s/trial, best loss: -0.9110673208972305]2024/10/22 19:08:35 INFO mlflow.tracking._tracking_service.client: 🏃 View run judicious-ram-774 at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130/runs/e9c1724a471842eca6a847bafb9c5e47. 2024/10/22 19:08:35 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130. 38%|███▊ | 12/32 [04:39<07:18, 21.93s/trial, best loss: -0.9138640135042344]2024/10/22 19:08:59 INFO mlflow.tracking._tracking_service.client: 🏃 View run legendary-fox-473 at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130/runs/71afe2b762ec44aa8203265f6df64af1. 2024/10/22 19:08:59 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130. 41%|████ | 13/32 [04:56<06:28, 20.44s/trial, best loss: -0.9138640135042344]2024/10/22 19:09:16 INFO mlflow.tracking._tracking_service.client: 🏃 View run secretive-carp-6 at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130/runs/ded394f6e7764b26a59c1e5891b7f03f. 2024/10/22 19:09:16 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130. 44%|████▍ | 14/32 [05:17<06:10, 20.61s/trial, best loss: -0.9138640135042344]2024/10/22 19:09:37 INFO mlflow.tracking._tracking_service.client: 🏃 View run polite-bug-851 at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130/runs/1b7bc4619d94451fb178771c1def6065. 2024/10/22 19:09:37 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130. 47%|████▋ | 15/32 [05:36<05:42, 20.13s/trial, best loss: -0.9138640135042344]2024/10/22 19:09:56 INFO mlflow.tracking._tracking_service.client: 🏃 View run languid-lamb-382 at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130/runs/c652b40953e44c768328a96458d83c89. 2024/10/22 19:09:56 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130. 50%|█████ | 16/32 [06:08<06:19, 23.70s/trial, best loss: -0.9138640135042344]2024/10/22 19:10:28 INFO mlflow.tracking._tracking_service.client: 🏃 View run bedecked-shoat-664 at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130/runs/1521a3e7e90746509c8ee6e88827e700. 2024/10/22 19:10:28 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130. 2024/10/22 19:10:47 INFO mlflow.tracking._tracking_service.client: 🏃 View run clumsy-gull-227 at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130/runs/d1ab6d54d8a94415a3594bfa9d1c17fa. 2024/10/22 19:10:47 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130. 53%|█████▎ | 17/32 [06:27<05:34, 22.29s/trial, best loss: -0.9138640135042344]2024/10/22 19:11:13 INFO mlflow.tracking._tracking_service.client: 🏃 View run nebulous-ant-890 at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130/runs/86d40fea01074c0f9223bd84853d0b38. 2024/10/22 19:11:13 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130. 59%|█████▉ | 19/32 [07:12<04:47, 22.09s/trial, best loss: -0.9138640135042344]2024/10/22 19:11:33 INFO mlflow.tracking._tracking_service.client: 🏃 View run invincible-gnat-370 at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130/runs/b32868f6008b4564b561d6c59d17784d. 2024/10/22 19:11:33 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130. 62%|██████▎ | 20/32 [07:31<04:13, 21.16s/trial, best loss: -0.9138640135042344]2024/10/22 19:11:52 INFO mlflow.tracking._tracking_service.client: 🏃 View run rumbling-eel-926 at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130/runs/6a555d6d984b4e36ac8b6bae320978e3. 2024/10/22 19:11:52 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130. 2024/10/22 19:12:13 INFO mlflow.tracking._tracking_service.client: 🏃 View run gentle-crane-193 at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130/runs/0ebd402b3c5542abafb4baae4cc1920f. 2024/10/22 19:12:13 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130. 69%|██████▉ | 22/32 [08:20<03:50, 23.10s/trial, best loss: -0.9138640135042344]2024/10/22 19:12:40 INFO mlflow.tracking._tracking_service.client: 🏃 View run adaptable-mouse-951 at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130/runs/9d0d756369a0487f99335ac949230b10. 2024/10/22 19:12:40 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130. 2024/10/22 19:12:53 INFO mlflow.tracking._tracking_service.client: 🏃 View run painted-croc-140 at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130/runs/416fd62159ba4b2487598f8cfe4ff772. 2024/10/22 19:12:53 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130. 75%|███████▌ | 24/32 [08:54<02:42, 20.35s/trial, best loss: -0.9138640135042344]2024/10/22 19:13:14 INFO mlflow.tracking._tracking_service.client: 🏃 View run treasured-carp-857 at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130/runs/881d46282e5b4f3e95d21787e23217de. 2024/10/22 19:13:14 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130. 2024/10/22 19:13:38 INFO mlflow.tracking._tracking_service.client: 🏃 View run resilient-snipe-574 at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130/runs/12806a7ebcf04191b1a3f94006dfdb0d. 2024/10/22 19:13:38 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130. 81%|████████▏ | 26/32 [09:45<02:18, 23.17s/trial, best loss: -0.9142252231631953]2024/10/22 19:14:10 INFO mlflow.tracking._tracking_service.client: 🏃 View run magnificent-newt-852 at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130/runs/c7abfa2f8b864723b758ed69056208e5. 2024/10/22 19:14:10 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130. 84%|████████▍ | 27/32 [10:21<02:15, 27.02s/trial, best loss: -0.9142252231631953]2024/10/22 19:14:41 INFO mlflow.tracking._tracking_service.client: 🏃 View run carefree-loon-547 at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130/runs/c12c5e36c728461e931b3c96d7930846. 2024/10/22 19:14:41 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130. 2024/10/22 19:15:03 INFO mlflow.tracking._tracking_service.client: 🏃 View run hilarious-sloth-409 at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130/runs/4a462b1973a34c499e61ad5e5e45efd7. 2024/10/22 19:15:03 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130. 88%|████████▊ | 28/32 [10:43<01:42, 25.58s/trial, best loss: -0.9161063744563973]2024/10/22 19:15:25 INFO mlflow.tracking._tracking_service.client: 🏃 View run lyrical-ram-627 at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130/runs/ec0f2d87e03d4391b6704a1374afb5bf. 2024/10/22 19:15:25 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130. 94%|█████████▍| 30/32 [11:24<00:45, 22.86s/trial, best loss: -0.9161063744563973]2024/10/22 19:15:45 INFO mlflow.tracking._tracking_service.client: 🏃 View run incongruous-hen-942 at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130/runs/c592085b34d14ea397ad84f36e644c53. 2024/10/22 19:15:45 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130. 2024/10/22 19:16:10 INFO mlflow.tracking._tracking_service.client: 🏃 View run selective-colt-186 at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130/runs/8da165cb71a543318779febf52157759. 2024/10/22 19:16:10 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130. 100%|██████████| 32/32 [12:12<00:00, 22.90s/trial, best loss: -0.9161063744563973] 2024/10/22 19:16:33 INFO mlflow.tracking._tracking_service.client: 🏃 View run salty-shrew-687 at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130/runs/2470eb61fc4d40dc9f42af4498792a8f. 2024/10/22 19:16:33 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130. Total Trials: 32: 32 succeeded, 0 failed, 0 cancelled. 2024/10/22 19:16:34 INFO mlflow.tracking._tracking_service.client: 🏃 View run gb_hyperopt at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130/runs/1313b271b0e84f298bd8175784e49409. 2024/10/22 19:16:34 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: https://db-sme-demo-docs.cloud.databricks.com/ml/experiments/1627933121202130.

Search runs to retrieve the best model

Because all of the runs are tracked by MLflow, you can retrieve the metrics and parameters for the best run using the MLflow search runs API to find the tuning run with the highest test auc.

This tuned model should perform better than the simpler models trained in Part 1.

24

Best Run AUC: 0.9161063744563973 Num Estimators: 639.0 Max Depth: 5.0 Learning Rate: 0.08953877758828054

Part 3. Save results and models to Unity Catalog

Write results back to Unity Catalog

Save model to Unity Catalog

Successfully registered model 'main.default.wine_model_vivian'.
Created version '1' of model 'main.default.wine_model_vivian'.
<ModelVersion: aliases=[], creation_timestamp=1729625422695, current_stage=None, description='', last_updated_timestamp=1729625423872, name='main.default.wine_model_vivian', run_id='4a462b1973a34c499e61ad5e5e45efd7', run_link=None, source='dbfs:/databricks/mlflow-tracking/1627933121202130/4a462b1973a34c499e61ad5e5e45efd7/artifacts/model', status='READY', status_message='', tags={}, user_id='vivian.tran@databricks.com', version='1'>