Train and register machine learning models with Unity Catalog
Unity Catalog allows you to apply fine-grained security to tables and models while interacting seamlessly with other machine-learning components in Databricks. This article shows how to use Python to train a machine-learning model using data in Unity Catalog and register the model in Unity Catalog.
Requirements
Your workspace must be enabled for Unity Catalog.
You must have the ability to create a cluster or have access to a cluster running in single-user access mode.
Create a Databricks Machine Learning cluster
Follow these steps to create a single-user Databricks Runtime ML cluster that can access data in Unity Catalog.
Click
Compute.
Click Create compute.
Under Access Mode, select Single User.
Databricks Runtime ML includes libraries that require the use of single user clusters. A single user cluster can be used exclusively by a single user (by default, the single user is the owner of the cluster). Other users cannot attach to the cluster.
For more information about the features available in each access mode, see What is cluster access mode?.
On the Databricks runtime version drop-down menu, select ML and select 11.3 LTS ML or higher.
Click Create Cluster.
Create the catalog
Follow these steps to create a new catalog where your machine learning team can store their data assets.
In a workspace with the metastore assigned, log in as the metastore admin, or as a user with the
CREATE CATALOG
privilege.Create a notebook or open the Databricks SQL editor.
Run the following command to create the
ml
catalog:CREATE CATALOG ml;
When you create a catalog, a schema named
default
is automatically created within it.Grant access to the
ml
catalog and theml.default
schema, and the ability to create tables and views, to theml_team
group. To include all account level users, you could use the groupaccount users
.GRANT USE CATALOG ON CATALOG ml TO `ml team`; GRANT USE SCHEMA, CREATE TABLE ON SCHEMA ml.default TO `ml_team`;
Now, any user in the ml_team
group can run the following example notebook.
Import the example notebook
To get started, import the following notebook.
To import the notebook:
Next to the notebook, click Copy link for import.
In your workspace, click
Workspace.
Next to a folder, click
, then click Import
Click URL, then paste in the link you copied.
The imported notebook appears in the folder you selected. Double-click the notebook name to open it.
At the top of the notebook, select your Databricks Machine Learning cluster to attach the notebook to it.
The notebook is divided into several high-level sections:
Setup.
Read data from CSV files and writing it to Unity Catalog.
Load the data into Pandas dataframes and clean it up.
Train a basic classification model.
Tune hyperparameters and optimize the model.
Register the model in Unity Catalog.
Write the results to a new table and share it with other users.
To run a cell, click Run. To run the entire notebook, click Run All.