Share models across workspaces

Note

For cross-workspace model development and deployment, Databricks recommends the deploy code approach, where the model training code is deployed to multiple environments.

Databricks supports sharing models across multiple workspaces. For example, you can develop and log a model in a development workspace, and then access and compare it against models in a separate production workspace. This is useful when multiple teams share access to models or when your organization has multiple workspaces to handle the different stages of development.

In such situations, you can access models across Databricks workspaces by using a remote model registry. For example, data scientists could access the production model registry with read-only access to compare their in-development models against the current production models. An example multi-workspace set-up is shown below.

Multiple workspaces

Access to a remote registry is controlled by tokens. Each user or script that needs access creates a personal access token in the remote registry and copies that token into the secret manager of their local workspace. Each API request sent to the remote registry workspace must include the access token; MLflow provides a simple mechanism to specify the secrets to be used when performing model registry operations.

Note

As a security best practice, when authenticating with automated tools, systems, scripts, and apps, Databricks recommends you use access tokens belonging to service principals instead of workspace users. For more information, see Manage service principals.

All client and fluent API methods for model registry are supported for remote workspaces.

Requirements

Using a model registry across workspaces requires the MLflow Python client, release 1.11.0 or above.

Note

This workflow is implemented from logic in the MLflow client. Ensure that the environment running the client has access to make network requests against the Databricks workspace containing the remote model registry. A common restriction put on the registry workspace is an IP allow list, which can disallow connections from MLflow clients running in a cluster in another workspace.

Set up the API token for a remote registry

  1. In the model registry workspace, create an access token.

  2. In the local workspace, create secrets to store the access token and the remote workspace information:

    1. Create a secret scope: databricks secrets create-scope --scope <scope>.

    2. Pick a unique name for the target workspace, shown here as <prefix>. Then create three secrets:

      • databricks secrets put --scope <scope> --key <prefix>-host : Enter the hostname of the model registry workspace. For example, https://cust-success.cloud.databricks.com/.

      • databricks secrets put --scope <scope> --key <prefix>-token : Enter the access token from the model registry workspace.

      • databricks secrets put --scope <scope> --key <prefix>-workspace-id : Enter the workspace ID for the model registry workspace which can be found in the URL of any page.

Note

You may want to share the secret scope with other users, since there is a limit on the number of secret scopes per workspace.

Specify a remote registry

Based on the secret scope and name prefix you created for the remote registry workspace, you can construct a registry URI of the form:

registry_uri = f'databricks://<scope>:<prefix>'

You can use the URI to specify a remote registry for fluent API methods by first calling:

mlflow.set_registry_uri(registry_uri)

Or, you can specify it explicitly when you instantiate an MlflowClient:

client = MlflowClient(registry_uri=registry_uri)

The following workflows show examples of both approaches.

Register a model in the remote registry

One way to register a model is to use the mlflow.register_model API:

mlflow.set_registry_uri(registry_uri)
mlflow.register_model(model_uri=f'runs:/<run_id>/<artifact_path>', name=model_name)

Examples for other model registration methods can be found in the notebook at the end of this page.

Note

Registering a model in a remote workspace creates a temporary copy of the model artifacts in DBFS in the remote workspace. You may want to delete this copy once the model version is in READY status. The temporary files can be found under the /dbfs/databricks/mlflow/tmp-external-source/<run_id> folder.

You can also specify a tracking_uri to point to a MLflow Tracking service in another workspace in a similar manner to registry_uri. This means you can take a run on a remote workspace and register its model in the current or another remote workspace.

Use a model from the remote registry

You can load and use a model version in a remote registry with mlflow.<flavor>.load_model methods by first setting the registry URI:

mlflow.set_registry_uri(registry_uri)
model = mlflow.pyfunc.load_model(f'models:/<model_name>/Staging')
model.predict(...)

Or, you can explicitly specify the remote registry in the models:/ URI:

model = mlflow.pyfunc.load_model(f'models://<scope>:<prefix>@databricks/<model_name>/Staging')
model.predict(...)

Other helper methods for accessing the model files are also supported, such as:

client.get_latest_versions(model_name)
client.get_model_version_download_uri(model_name, version)

Manage a model in the remote registry

You can perform any action on models in the remote registry as long as you have the required permissions. For example, if you have Can Manage permissions on a model, you can transition a model version stage or delete the model using MlflowClient methods:

client = MlflowClient(tracking_uri=None, registry_uri=registry_uri)
client.transition_model_version_stage(model_name, version, 'Archived')
client.delete_registered_model(model_name)

Notebook

Remote Model Registry example notebook

Open notebook in new tab