MLflow Model Serving on Databricks


This feature is in Public Preview.

MLflow Model Serving allows you to host machine learning models from Model Registry as REST endpoints that are updated automatically based on the availability of model versions and their stages.

When you enable model serving for a given registered model, Databricks automatically creates a unique single-node cluster for the model and deploys all non-archived versions of the model on that cluster. Databricks restarts the cluster if any error occurs, and terminates the cluster when you disable model serving for the model. Model serving automatically syncs with Model Registry and deploys any new registered model versions. Deployed model versions can be queried with standard REST API request. Databricks authenticates requests to the model using its standard authentication.

While this service is in preview, Databricks recommends its use for low throughput and non-critical applications. Target throughput is 20 qps and target availability is 99.5%, although no guarantee is made as to either. Additionally, there is a payload size limit of 16 MB per request.

Each model version is deployed using MLflow model deployment and runs in a Conda environment specified by its dependencies.


The cluster is maintained as long as serving is enabled, even if no active model version exists. To terminate the serving cluster, disable model serving for the registered model.


MLflow Model Serving is available for Python MLflow models. All model dependencies must be declared in the conda environment.

Enable and disable model serving

You enable a model for serving from its registered model page.

  1. Click the Serving tab. If the model is not already enabled for serving, the Enable Serving button appears.
  2. Click Enable Serving. The Serving tab appears with the Status as Pending. After a few minutes, the Status changes to Ready.

To disable a model for serving, click Stop.

Model serving from Model Registry

You enable serving of a registered model in Model Registry UI.

Enable serving

Model version URIs

Each deployed model version is assigned one or several unique URIs. At minimum, each model version is assigned a URI constructed as follows:


For example, to call version 1 of a model registered as iris-classifier at, use this URI:

You can also call a model version by its stage. For example, if version 1 is in the Production stage, it can also be scored using this URI:

The list of available model URIs appears at the top of the Model Versions tab on the serving page.

Manage served versions

All active (non-archived) model versions are deployed, and you can query them using the URIs. Databricks automatically deploys new model versions when they are registered, and automatically removes old versions when they are archived.


All deployed versions of a registered model share the same cluster.

Manage model access rights

Model access rights are inherited from the Model Registry. Enabling or disabling the serving feature requires ‘manage’ permission on the registered model. Anyone with read rights can score any of the deployed versions.

Score deployed model versions

To score a deployed model, you can use the UI or send a REST API request to the model URI.

Score via UI

This is the easiest and fastest way to test the model. You can insert the model input data in JSON format and click Send Request. If the model has been logged with an input example (as shown in the graphic above), click Load Example to load the input example.

Score via REST API request

You can send a scoring request through the REST API using standard Databricks authentication. The examples below demonstrate authentication using a personal access token.

Given a MODEL_VERSION_URI like https://<databricks-instance>/model/iris-classifier/Production/invocations (where <databricks-instance> is the name of your Databricks instance) and a Databricks REST API token called DATABRICKS_API_TOKEN, here are some example snippets of how to query a served model:

  -H 'Content-Type: application/json; format=pandas-records' \
  -d '[
      "sepal_length": 5.1,
      "sepal_width": 3.5,
      "petal_length": 1.4,
      "petal_width": 0.2
import requests

def score_model(model_uri, databricks_token, data):
  headers = {
    "Authorization": f"Bearer {databricks_token}",
    "Content-Type": "application/json; format=pandas-records",
  data_json = data if isinstance(data, list) else data.to_dict(orient="records")
  response = requests.request(method='POST', headers=headers, url=model_uri, json=data_json)
  if response.status_code != 200:
      raise Exception(f"Request failed with status {response.status_code}, {response.text}")
  return response.json()

data = [{
  "sepal_length": 5.1,
  "sepal_width": 3.5,
  "petal_length": 1.4,
  "petal_width": 0.2

# can also score DataFrames
import pandas as pd
score_model(MODEL_VERSION_URI, DATABRICKS_API_TOKEN, pd.DataFrame(data))

You can score a dataset in Power BI Desktop using the following steps:

  1. Open dataset you want to score.

  2. Go to Transform Data.

  3. Right-click in the left panel and select Create New Query.

  4. Go to View > Advanced Editor.

  5. Replace the query body with the code snippet below, after filling in an appropriate DATABRICKS_API_TOKEN and MODEL_VERSION_URI.

    (dataset as table ) as table =>
      call_predict = (dataset as table ) as list =>
        apiToken = DATABRICKS_API_TOKEN,
        modelUri = MODEL_VERSION_URI,
        responseList = Json.Document(Web.Contents(modelUri,
            Headers = [
              #"Content-Type" = "application/json; format=pandas-records",
              #"Authorization" = Text.Format("Bearer #{0}", {apiToken})
            Content = Json.FromValue(dataset)
      predictionList = List.Combine(List.Transform(Table.Split(dataset, 256), (x) => call_predict(x))),
      predictionsTable = Table.FromList(predictionList, (x) => {x}, {"Prediction"}),
      datasetWithPrediction = Table.Join(
        Table.AddIndexColumn(predictionsTable, "index"), "index",
        Table.AddIndexColumn(dataset, "index"), "index")
  6. Name the query with your desired model name.

  7. Open the advanced query editor for your dataset and apply the model function.

For more information about input data formats accepted by the server (for example, pandas split-oriented format), see the MLflow documentation.

Monitor served models

The serving page displays status indicators for the serving cluster as well as individual model versions. In addition, you can use the following to obtain further information:

  • To inspect the state of the serving cluster, use the Model Events tab, which displays a list of all serving events for this model.
  • To inspect the state of a single model version, use the Logs or Version Events tabs on the Model Versions tab.
Version status
Model events