Serverless optimized deployments for model serving endpoints
This article describes how to use serverless optimized deployments on your model serving endpoints. Serverless optimized deployments dramatically lower deployment times and keep the model serving environment the same as the model training environment.
What are serverless optimized deployments?
Serverless optimized deployments take advantage of packaging and staging model artifacts in serverless notebook environments during model registration, resulting in accelerated endpoint deployment and consistent environments between training and serving.
This differs from non-serverless optimized deployments, where model artifacts and environments are packaged into containers at deployment time. In such cases, the serving environment may not match the one used during model training.
Requirements
Serverless optimized endpoints have the same requirements as model serving endpoint (see Requirements). In addition:
- The model must be a custom model (not FMAPI)
- The model must be logged and registered in a Serverless Notebook using version 3 or 4
- The model must be logged and registered with
mlflow>=3.1 - The model must be registered in UC and served with CPU
- The model's max environment size is 1GB
Using serverless optimized deployments
When logging and registering a model, use a Serverless Notebook with client 3 or 4 and mlflow>=3.1.
To adjust the client version of the serverless environment, see Configure the serverless environment.
Then, when registering a model, set the env_pack parameter with the desired values.
import mlflow
from mlflow.utils.env_pack import EnvPackConfig
mlflow.register_model(
model_info.model_uri,
model_name,
env_pack=env_pack=EnvPackConfig(name="databricks_model_serving")
)
Adding in the env_pack parameter will make the function pack and stage the model artifacts and serverless notebook environment during model registration to prepare it for usage during deployment. This may take additional time compared to registering the model without env_pack.
EnvPackConfig has a parameter install_dependencies (True by default) that determines whether the model's dependencies are installed in the current environment to confirm the environment is valid. If you'd like to skip that step, set the value to False.
Endpoints in workspaces without internet access or endpoints with dependencies on custom libraries may fail if install_dependencies is set to True. In these cases, set install_dependencies to False.
You can also substitute EnvPackConfig(...) with "databricks_model_serving" as a shorthand. This is equivalent to EnvPackConfig(name="databricks_model_serving", install_dependencies = True).
After registering the model is finished, you can deploy the model in model serving. Notice that the deployment time is reduced and the event logs no longer indicate container build.