Use custom Python libraries with Model Serving
In this article, you learn how to include custom libraries or libraries from a private mirror server when you log your model, so that you can use them with Model Serving model deployments. You should complete the steps detailed in this guide after you have a trained ML model ready to deploy but before you create a Databricks Model Serving endpoint.
Model development often requires the use of custom Python libraries that contain functions for pre- or post-processing, custom model definitions, and other shared utilities. In addition, many enterprise security teams encourage the use of private PyPi mirrors, such as Nexus or Artifactory, to reduce the risk of supply-chain attacks. Databricks offers native support for installation of custom libraries and libraries from a private mirror in the Databricks workspace.
Step 1: Upload dependency file
Databricks recommends that you upload your dependency file to Unity Catalog volumes. Alternatively, you can upload it to Databricks File System (DBFS) using the Databricks UI.
To ensure your library is available to your notebook, you need to install it using %pip%
. Using %pip
installs the library in the current notebook and downloads the dependency to the cluster.
Step 2: Log the model with a custom library
Important
The guidance in this section is not required if you install the private library by pointing to a custom PyPi mirror.
After you install the library and upload the Python wheel file to either Unity Catalog volumes or DBFS, include the following code in your script. In the extra_pip_requirements
specify the path of your dependency file.
mlflow.sklearn.log_model(model, "sklearn-model", extra_pip_requirements=["/volume/path/to/dependency.whl"])
For DBFS, use the following:
mlflow.sklearn.log_model(model, "sklearn-model", extra_pip_requirements=["/dbfs/path/to/dependency.whl"])
If you have a custom library, you must specify all custom Python libraries associated with your model when you configure logging. You can do so with the extra_pip_requirements
or conda_env
parameters in log_model().
Important
If using DBFS, be sure to include a forward slash, /
, before your dbfs
path when logging extra_pip_requirements
. Learn more about DBFS paths in Work with files on Databricks.
from mlflow.utils.environment import _mlflow_conda_env
conda_env = _mlflow_conda_env(
additional_conda_deps= None,
additional_pip_deps= ["/volumes/path/to/dependency"],
additional_conda_channels=None,
)
mlflow.pyfunc.log_model(..., conda_env = conda_env)
Step 3: Update MLflow model with Python wheel files
MLflow provides the add_libraries_to_model() utility to log your model with all of its dependencies pre-packaged as Python wheel files. This packages your custom libraries alongside the model in addition to all other libraries that are specified as dependencies of your model. This guarantees that the libraries used by your model are exactly the ones accessible from your training environment.
In the following example, model_uri
references the model registry using the syntax models:/<model-name>/<model-version>
.
When you use the model registry URI, this utility generates a new version under your existing registered model.
import mlflow.models.utils
mlflow.models.utils.add_libraries_to_model(<model-uri>)
Step 4: Serve your model
When a new model version with the packages included is available in the model registry, you can add this model version to an endpoint with Model Serving.