Manage model lifecycle in Unity Catalog

Preview

This feature is in Public Preview.

This article describes how to use Models in Unity Catalog as part of your machine learning workflow to manage the full lifecycle of ML models. Databricks provides a hosted version of MLflow Model Registry in Unity Catalog. Models in Unity Catalog extends the benefits of Unity Catalog to ML models, including centralized access control, auditing, lineage, and model discovery across workspaces. Models in Unity Catalog is compatible with the open-source MLflow Python client.

Key features of models in Unity Catalog include:

  • Namespacing and governance for models, so you can group and govern models at the environment, project, or team level (“Grant data scientists read-only access to production models”).

  • Chronological model lineage (which MLflow experiment and run produced the model at a given time).

  • Model Serving.

  • Model versioning.

  • Model deployment via aliases. For example, mark the “Champion” version of a model within your prod catalog.

This article includes instructions for both the Models in Unity Catalog UI and API.

For an overview of Model Registry concepts, see MLflow guide.

Note

This article documents Models in Unity Catalog, which Databricks recommends for governing and deploying models. For documentation of the classic Workspace Model Registry, see Manage model lifecycle using the Workspace Model Registry. For guidance on how to upgrade from the Workspace Model Registry to Unity Catalog, see Migrate workflows and models to Unity Catalog.

Requirements

  1. Unity Catalog must be enabled in your workspace. See Get started using Unity Catalog to create a Unity Catalog Metastore, enable it in a workspace, and create a catalog. If Unity Catalog is not enabled, you can still use the classic workspace model registry.

  2. Your workspace must be attached to a Unity Catalog metastore that supports privilege inheritance. This is true for all metastores created after August 25, 2022. If running on an older metastore, follow docs to upgrade.

  3. You must have access to run commands on a cluster with access to Unity Catalog.

  4. To create new registered models, you need the CREATE_MODEL privilege on a schema, in addition to the USE SCHEMA and USE CATALOG privileges on the schema and its enclosing catalog. CREATE_MODEL is a new schema-level privilege that you can grant using the Catalog Explorer UI or the SQL GRANT command, as shown below.

    GRANT CREATE_MODEL ON SCHEMA <schema-name> TO <principal>
    

Upgrade training workloads to Unity Catalog

This section includes instructions to upgrade existing training workloads to Unity Catalog.

Install MLflow Python client

Support for models in Unity Catalog is included in Databricks Runtime 13.2 ML and above. You can also use models in Unity Catalog on Databricks Runtime 11.3 LTS and above by installing the latest version of the MLflow Python client in your notebook, using the code below.

%pip install --upgrade "mlflow-skinny[databricks]"
dbutils.library.restartPython()

Configure MLflow client to access models in Unity Catalog

By default, the MLflow Python client creates models in the Databricks workspace model registry. To upgrade to models in Unity Catalog, configure the MLflow client:

import mlflow
mlflow.set_registry_uri("databricks-uc")

Train and register Unity Catalog-compatible models

Permissions required: To create a new registered model, you need the CREATE_MODEL and USE SCHEMA privileges on the enclosing schema, and USE CATALOG privilege on the enclosing catalog. To create new model versions under a registered model, you must be the owner of the registered model and have USE SCHEMA and USE CATALOG privileges on the schema and catalog containing the model.

ML model versions in UC must have a model signature. If you’re not already logging MLflow models with signatures in your model training workloads, you can either:

  • Use Databricks autologging, which automatically logs models with signatures for many popular ML frameworks. See supported frameworks in the MLflow docs.

  • With MLflow 2.5.0 and above, you can specify an input example in your mlflow.<flavor>.log_model call, and the model signature is automatically inferred. For further information, refer to the MLflow documentation.

Then, pass the three-level name of the model to MLflow APIs, in the form <catalog>.<schema>.<model>.

The examples in this section create and access models in the ml_team schema under the prod catalog.

The model training examples in this section create a new model version and register it in the prod catalog. Using the prod catalog doesn’t necessarily mean that the model version serves production traffic. The model version’s enclosing catalog, schema, and registered model reflect its environment (prod) and associated governance rules (for example, privileges can be set up so that only admins can delete from the prod catalog), but not its deployment status. To manage the deployment status, use model aliases.

Register a model to Unity Catalog using autologging

from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier

# Train a sklearn model on the iris dataset
X, y = datasets.load_iris(return_X_y=True, as_frame=True)
clf = RandomForestClassifier(max_depth=7)
clf.fit(X, y)

# Note that the UC model name follows the pattern
# <catalog_name>.<schema_name>.<model_name>, corresponding to
# the catalog, schema, and registered model name
# in Unity Catalog under which to create the version
# The registered model will be created if it doesn't already exist
autolog_run = mlflow.last_active_run()
model_uri = "runs:/{}/model".format(autolog_run.info.run_id)
mlflow.register_model(model_uri, "prod.ml_team.iris_model")

Register a model to Unity Catalog with automatically inferred signature

Support for automatically inferred signatures is available in MLflow version 2.5.0 and above, and is supported in Databricks Runtime 11.3 LTS ML and above. To use automatically inferred signatures, use the following code to install the latest MLflow Python client in your notebook:

%pip install --upgrade "mlflow-skinny[databricks]"
dbutils.library.restartPython()

The following code shows an example of an automatically inferred signature.

from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier

with mlflow.start_run():
    # Train a sklearn model on the iris dataset
    X, y = datasets.load_iris(return_X_y=True, as_frame=True)
    clf = RandomForestClassifier(max_depth=7)
    clf.fit(X, y)
    # Take the first row of the training dataset as the model input example.
    input_example = X.iloc[[0]]
    # Log the model and register it as a new version in UC.
    mlflow.sklearn.log_model(
        sk_model=clf,
        artifact_path="model",
        # The signature is automatically inferred from the input example and its predicted output.
        input_example=input_example,
        registered_model_name="prod.ml_team.iris_model",
    )

View models in the UI

Permissions required: To view a registered model and its model versions in the UI, you need EXECUTE privilege on the registered model, plus USE SCHEMA and USE CATALOG privileges on the schema and catalog containing the model

You can view and manage registered models and model versions in Unity Catalog using the Catalog Explorer.

Control access to models

For information about controlling access to models registered in Unity Catalog, see Unity Catalog privileges and securable objects. For best best practices on organizing models across catalogs and schemas, see Organize your data.

You can configure model permissions programatically using the Grants REST API. When configuring model permissions, set securable_type to "FUNCTION" in REST API requests. For example, use PATCH /api/2.1/unity-catalog/permissions/function/{full_name} to update registered model permissions.

Deploy and organize models with aliases and tags

Model aliases and tags help you organize and manage models in Unity Catalog.

Model aliases allow you to assign a mutable, named reference to a particular version of a registered model. You can use aliases to indicate the deployment status of a model version. For example, you could allocate a “Champion” alias to the model version currently in production and target this alias in workloads that use the production model. You can then update the production model by reassigning the “Champion” alias to a different model version.

Tags are key-value pairs that you associate with registered models and model versions, allowing you to label and categorize them by function or status. For example, you could apply a tag with key "task" and value "question-answering" (displayed in the UI as task:question-answering) to registered models intended for question answering tasks. At the model version level, you could tag versions undergoing pre-deployment validation with validation_status:pending and those cleared for deployment with validation_status:approved.

See the following sections for how to use aliases and tags.

Set and delete aliases on models

Permissions required: Owner of the registered model, plus USE SCHEMA and USE CATALOG privileges on the schema and catalog containing the model.

You can set, update, and remove aliases for models in Unity Catalog by using Catalog Explorer. You can manage aliases across a registered model in the model details page and configure aliases for a specific model version in the model version details page.

To set, update, and delete aliases using the MLflow Client API, see the examples below:

from mlflow import MlflowClient
client = MlflowClient()

# create "Champion" alias for version 1 of model "prod.ml_team.iris_model"
client.set_registered_model_alias("prod.ml_team.iris_model", "Champion", 1)

# reassign the "Champion" alias to version 2
client.set_registered_model_alias("prod.ml_team.iris_model", "Champion", 2)

# get a model version by alias
client.get_model_version_by_alias("prod.ml_team.iris_model", "Champion")

# delete the alias
client.delete_registered_model_alias("prod.ml_team.iris_model", "Champion")

Set and delete tags on models

Permissions required: Owner of or have APPLY_TAG privilege on the registered model, plus USE SCHEMA and USE CATALOG privileges on the schema and catalog containing the model.

See Manage tags in Catalog Explorer on how to set and delete tags using the UI.

To set and delete tags using the MLflow Client API, see the examples below:

from mlflow import MlflowClient
client = MlflowClient()

# Set registered model tag
client.set_registered_model_tag("prod.ml_team.iris_model", "task", "classification")

# Delete registered model tag
client.delete_registered_model_tag("prod.ml_team.iris_model", "task")

# Set model version tag
client.set_model_version_tag("prod.ml_team.iris_model", "1", "validation_status", "approved")

# Delete model version tag
client.delete_model_version_tag("prod.ml_team.iris_model", "1", "validation_status")

Both registered model and model version tags must meet the platform-wide constraints.

For more details on alias and tag client APIs, see the MLflow API documentation.

Load models for inference

Consume model versions by alias in inference workloads

Permissions required: EXECUTE privilege on the registered model, plus USE SCHEMA and USE CATALOG privileges on the schema and catalog containing the model.

You can write batch inference workloads that reference a model version by alias. For example, the snippet below loads and applies the “Champion” model version for batch inference. If the “Champion” version is updated to reference a new model version, the batch inference workload automatically picks it up on its next execution. This allows you to decouple model deployments from your batch inference workloads.

import mlflow.pyfunc
model_version_uri = "models:/prod.ml_team.iris_model@Champion"
champion_version = mlflow.pyfunc.load_model(model_version_uri)
champion_version.predict(test_x)

You can also write deployment workflows to get a model version by alias and update a model serving endpoint to serve that version, using the model serving REST API:

import mlflow
import requests
client = mlflow.tracking.MlflowClient()
champion_version = client.get_model_version_by_alias("prod.ml_team.iris_model", "Champion")
# Invoke the model serving REST API to update endpoint to serve the current "Champion" version
model_name = champion_version.name
model_version = champion_version.version
requests.request(...)

Consume model versions by version number in inference workloads

You can also load model versions by version number:

import mlflow.pyfunc
# Load version 1 of the model "prod.ml_team.iris_model"
model_version_uri = "models:/prod.ml_team.iris_model/1"
first_version = mlflow.pyfunc.load_model(model_version_uri)
first_version.predict(test_x)

Share models across workspaces

As long as you have the appropriate privileges, you can access models in Unity Catalog from any workspace. For example, you can access models from the prod catalog in a dev workspace, to facilitate comparing newly-developed models to the production baseline.

To collaborate with other users (share write privileges) on a registered model you created, you must grant ownership of the model to a group containing yourself and the users you’d like to collaborate with. Collaborators must also have the USE CATALOG and USE SCHEMA privileges on the catalog and schema containing the model. See Unity Catalog privileges and securable objects for details.

Annotate a model or model version

Permissions required: Owner of the registered model, plus USE SCHEMA and USE CATALOG privileges on the schema and catalog containing the model.

You can provide information about a model or model version by annotating it. For example, you may want to include an overview of the problem or information about the methodology and algorithm used.

Annotate a model or model version using the UI

See How to add markdown comments to data objects.

Annotate a model or model version using the API

To update a registered model description, use the MLflow Client API update_registered_model() method:

client = MlflowClient()
client.update_registered_model(
  name="<model-name>",
  description="<description>"
)

To update a model version description, use the MLflow Client API update_model_version() method:

client = MlflowClient()
client.update_model_version(
  name="<model-name>",
  version=<model-version>,
  description="<description>"
)

Rename a model (API only)

Permissions required: Owner of the registered model, CREATE_MODEL privilege on the schema containing the registered model, and USE SCHEMA and USE CATALOG privileges on the schema and catalog containing the model.

To rename a registered model, use the MLflow Client API rename_registered_model() method:

client=MlflowClient()
client.rename_registered_model("<model-name>", "<new-model-name>")

Delete a model or model version

Permissions required: Owner of the registered model, plus USE SCHEMA and USE CATALOG privileges on the schema and catalog containing the model.

You can delete a registered model or a model version within a registered model using the Catalog Explorer UI or the API.

Delete a model version or model using the API

Warning

You cannot undo this action. When you delete a model, all model artifacts stored by Unity Catalog and all the metadata associated with the registered model are deleted.

Delete a model version

To delete a model version, use the MLflow Client API delete_model_version() method:

# Delete versions 1,2, and 3 of the model
client = MlflowClient()
versions=[1, 2, 3]
for version in versions:
  client.delete_model_version(name="<model-name>", version=version)

Delete a model

To delete a model, use the MLflow Client API delete_registered_model() method:

client = MlflowClient()
client.delete_registered_model(name="<model-name>")

List and search models

You can list registered models in Unity Catalog with MLflow’s search_registered_models() Python API:

client=MlflowClient()
client.search_registered_models()

You can also search for a specific model name and list its version details using the search_model_versions() method:

from pprint import pprint

client=MlflowClient()
[pprint(mv) for mv in client.search_model_versions("name='<model-name>'")]

Example

This example illustrates how to use Models in Unity Catalog to build a machine learning application.

Models in Unity Catalog example

Migrate workflows and models to Unity Catalog

The articles linked below describe how to migrate workflows and models (model training and batch inference jobs) from the Workspace Model Registry to Unity Catalog. Databricks recommends using Models in Unity Catalog for improved governance, easy sharing across workspaces and environments, and more flexible MLOps workflows.

Limitations on Unity Catalog support

  • Stages are not supported for models in Unity Catalog. Databricks recommends using the three-level namespace in Unity Catalog to express the environment a model is in, and using aliases to promote models for deployment. See the upgrade guide for details.

  • Webhooks are not supported for models in Unity Catalog. See suggested alternatives in the upgrade guide.

  • Some search API fields and operators are not supported for models in Unity Catalog. This can be mitigated by calling the search APIs using supported filters and scanning the results. Following are some examples:

    • The order_by parameter is not supported in the search_model_versions or search_registered_models client APIs.

    • Tag-based filters (tags.mykey = 'myvalue') are not supported for search_model_versions or search_registered_models.

    • Operators other than exact equality (for example, LIKE, ILIKE, !=) are not supported for search_model_versions or search_registered_models.

    • Searching registered models by name (for example, MlflowClient().search_registered_models(filter_string="name='main.default.mymodel'") is not supported. To fetch a particular registered model by name, use get_registered_model.

  • Email notifications and comment discussion threads on registered models and model versions are not supported in Unity Catalog.

  • The activity log is not supported for models in Unity Catalog. However, you can track activity on models in Unity Catalog using audit logs.