MLflow Model Serving on Databricks


This feature is in Public Preview.

MLflow Model Serving allows you to host machine learning models from Model Registry as REST endpoints that are updated automatically based on the availability of model versions and their stages.

When you enable model serving for a given registered model, Databricks automatically creates a unique cluster for the model and deploys all non-archived versions of the model on that cluster. Databricks restarts the cluster if an error occurs and terminates the cluster when you disable model serving for the model. Model serving automatically syncs with Model Registry and deploys any new registered model versions. Deployed model versions can be queried with a standard REST API request. Databricks authenticates requests to the model using its standard authentication.

While this service is in preview, Databricks recommends its use for low throughput and non-critical applications. Target throughput is 200 qps and target availability is 99.5%, although no guarantee is made as to either. Additionally, there is a payload size limit of 16 MB per request.

Each model version is deployed using MLflow model deployment and runs in a Conda environment specified by its dependencies.


  • The cluster is maintained as long as serving is enabled, even if no active model version exists. To terminate the serving cluster, disable model serving for the registered model.
  • The cluster is considered an all-purpose cluster, subject to all-purpose workload pricing.
  • Global init scripts are not run on serving clusters.


  • MLflow Model Serving is available for Python MLflow models. You must declare all model dependencies in the conda environment.
  • To enable Model Serving, you must have cluster creation permission.

Model serving from Model Registry

Model serving is available in Databricks from Model Registry.

Enable and disable model serving

You enable a model for serving from its registered model page.

  1. Click the Serving tab. If the model is not already enabled for serving, the Enable Serving button appears.

    Enable serving button
  2. Click Enable Serving. The Serving tab appears with Status shown as Pending. After a few minutes, Status changes to Ready.

To disable a model for serving, click Stop.

Validate model serving

From the Serving tab, you can send a request to the served model and view the response.

Enable serving

Model version URIs

Each deployed model version is assigned one or several unique URIs. At minimum, each model version is assigned a URI constructed as follows:


For example, to call version 1 of a model registered as iris-classifier, use this URI:


You can also call a model version by its stage. For example, if version 1 is in the Production stage, it can also be scored using this URI:


The list of available model URIs appears at the top of the Model Versions tab on the serving page.

Manage served versions

All active (non-archived) model versions are deployed, and you can query them using the URIs. Databricks automatically deploys new model versions when they are registered, and automatically removes old versions when they are archived.


All deployed versions of a registered model share the same cluster.

Manage model access rights

Model access rights are inherited from the Model Registry. Enabling or disabling the serving feature requires ‘manage’ permission on the registered model. Anyone with read rights can score any of the deployed versions.

Score deployed model versions

To score a deployed model, you can use the UI or send a REST API request to the model URI.

Score via UI

This is the easiest and fastest way to test the model. You can insert the model input data in JSON format and click Send Request. If the model has been logged with an input example (as shown in the graphic above), click Load Example to load the input example.

Score via REST API request

You can send a scoring request through the REST API using standard Databricks authentication. The examples below demonstrate authentication using a personal access token.

Given a MODEL_VERSION_URI like https://<databricks-instance>/model/iris-classifier/Production/invocations (where <databricks-instance> is the name of your Databricks instance) and a Databricks REST API token called DATABRICKS_API_TOKEN, here are some example snippets of how to query a served model:

Snippet to query a model accepting dataframe inputs.

  -H 'Content-Type: application/json' \
  -d '[
      "sepal_length": 5.1,
      "sepal_width": 3.5,
      "petal_length": 1.4,
      "petal_width": 0.2

Snippet to query a model accepting tensor inputs. Tensor inputs should be formatted as described in TensorFlow Serving’s API docs.

   -H 'Content-Type: application/json' \
   -d '{"inputs": [[5.1, 3.5, 1.4, 0.2]]}'
import numpy as np
import pandas as pd
import requests

def create_tf_serving_json(data):
  return {'inputs': {name: data[name].tolist() for name in data.keys()} if isinstance(data, dict) else data.tolist()}

def score_model(model_uri, databricks_token, data):
  headers = {
    "Authorization": f"Bearer {databricks_token}",
    "Content-Type": "application/json",
  data_json = data.to_dict(orient='records') if isinstance(data, pd.DataFrame) else create_tf_serving_json(data)
  response = requests.request(method='POST', headers=headers, url=model_uri, json=data_json)
  if response.status_code != 200:
      raise Exception(f"Request failed with status {response.status_code}, {response.text}")
  return response.json()

# Scoring a model that accepts pandas DataFrames
data =  pd.DataFrame([{
  "sepal_length": 5.1,
  "sepal_width": 3.5,
  "petal_length": 1.4,
  "petal_width": 0.2

# Scoring a model that accepts tensors
data = np.asarray([[5.1, 3.5, 1.4, 0.2]])

You can score a dataset in Power BI Desktop using the following steps:

  1. Open dataset you want to score.

  2. Go to Transform Data.

  3. Right-click in the left panel and select Create New Query.

  4. Go to View > Advanced Editor.

  5. Replace the query body with the code snippet below, after filling in an appropriate DATABRICKS_API_TOKEN and MODEL_VERSION_URI.

    (dataset as table ) as table =>
      call_predict = (dataset as table ) as list =>
        apiToken = DATABRICKS_API_TOKEN,
        modelUri = MODEL_VERSION_URI,
        responseList = Json.Document(Web.Contents(modelUri,
            Headers = [
              #"Content-Type" = "application/json",
              #"Authorization" = Text.Format("Bearer #{0}", {apiToken})
            Content = Json.FromValue(dataset)
      predictionList = List.Combine(List.Transform(Table.Split(dataset, 256), (x) => call_predict(x))),
      predictionsTable = Table.FromList(predictionList, (x) => {x}, {"Prediction"}),
      datasetWithPrediction = Table.Join(
        Table.AddIndexColumn(predictionsTable, "index"), "index",
        Table.AddIndexColumn(dataset, "index"), "index")
  6. Name the query with your desired model name.

  7. Open the advanced query editor for your dataset and apply the model function.

For more information about input data formats accepted by the server (for example, pandas split-oriented format), see the MLflow documentation.

Monitor served models

The serving page displays status indicators for the serving cluster as well as individual model versions.

  • To inspect the state of the serving cluster, use the Model Events tab, which displays a list of all serving events for this model.
  • To inspect the state of a single model version, click the Model Versions tab and scroll to view the Logs or Version Events tabs.
Serving tab

Customize the serving cluster

To customize the serving cluster, use the Cluster Settings tab on the Serving tab .

Cluster settings
  • To modify the memory size and number of cores of a serving cluster, use the Instance Type drop-down menu to select the desired cluster configuration. When you click Save, the existing cluster is terminated and a new cluster is created with the specified settings.
  • To add a tag, type the name and value in the Add Tag fields and click Add.
  • To edit or delete an existing tag, click one of the icons in the Actions column of the Tags table.

Known errors

ResolvePackageNotFound: pyspark=3.1.0

This error can occur if a model depends on pyspark and is logged using Databricks Runtime 8.x. If you see this error, specify the pyspark version explicitly when logging the model, using the `conda_env` parameter.