Send scoring requests to serving endpoints

In this article, you learn how to format scoring requests for your served model, and how to send those requests to the model serving endpoint. See Model serving with Databricks.

To score a deployed model, you can send a REST API request to the model URL or use the UI.

You can call a model by calling the API and score using this URI:

POST /serving-endpoints/{endpoint-name}/invocations

See Query individual models behind an endpoint for how to send requests for a specific model behind an endpoint.

Request format

Requests should be sent by constructing a JSON with one of the supported keys and a JSON object corresponding to the input format. The following is the recommended format.

The dataframe_split format is a JSON-serialized Pandas Dataframe in the split orientation.

  {
    "dataframe_split": [{
      "index": [0, 1],
      "columns": ["sepal length (cm)", "sepal width (cm)", "petal length (cm)", "petal width (cm)"],
      "data": [[5.1, 3.5, 1.4, 0.2], [4.9, 3.0, 1.4, 0.2]]
    }]
  }

The dataframe_records is JSON-serialized Pandas Dataframe in the records orientation. Although a supported format, this use case is less common.

Note

This format does not guarantee the preservation of column ordering, and the split format is preferred over the records format.

{
  "dataframe_records": [
  {
     "sepal length (cm)": 5.1,
     "sepal width (cm)": 3.5,
     "petal length (cm)": 1.4,
     "petal width (cm)": 0.2
  },
  {
     "sepal length (cm)": 4.9,
     "sepal width (cm)": 3,
     "petal length (cm)": 1.4,
     "petal width (cm)": 0.2
   },
   {
     "sepal length (cm)": 4.7,
     "sepal width (cm)": 3.2,
     "petal length (cm)": 1.3,
     "petal width (cm)": 0.2
   }
  ]
}

Tensors format

When your model expects tensors, like a Tensorflow or Pytorch model, there are two supported format options for sending requests: instances and inputs.

If you have multiple named tensors per row, then you have to have one of each tensor for every row.

  • instances is a tensors-based format that accepts tensors in row format. Use this format if all the input tensors have the same 0-th dimension. Conceptually, each tensor in the instances list could be joined with the other tensors of the same name in the rest of the list to construct the full input tensor for the model, which would only be possible if all of the tensors have the same 0-th dimension.

      {"instances": [ 1, 2, 3 ]}
    

    or

    The following example shows how to specify multiple named tensors.

    {
     "instances": [
      {
       "t1": "a",
       "t2": [1, 2, 3, 4, 5],
       "t3": [[1, 2], [3, 4], [5, 6]]
      },
      {
       "t1": "b",
       "t2": [6, 7, 8, 9, 10],
       "t3": [[7, 8], [9, 10], [11, 12]]
      }
     ]
    }
    
  • inputs send queries with tensors in columnar format. This request is different because there are actually a different number of tensor instances of t2 (3) than t1 and t3, so it is not possible to represent this input in the instances format.

    {
     "inputs": {
      "t1": ["a", "b"],
      "t2": [[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]],
      "t3": [[[1, 2], [3, 4], [5, 6]], [[7, 8], [9, 10], [11, 12]]]
     }
    }
    

Response format

The response from the endpoint is in the following format. The output from your model is wrapped in a predictions key. Concretely, the out is almost always a list, and frequently a list of numbers.

{
  "predictions": [0,1,1,1,0]
}

Send scoring requests with the UI

Sending requests using the UI is the easiest and fastest way to test the model.

  1. From the Serving endpoint page, select Query endpoint.

    1. Insert the model input data in JSON format and click Send Request.

      1. If the model has been logged with an input example, click Show Example to load the input example.

Send scoring requests with the API

You can send a scoring request through the REST API using standard Databricks authentication. The following examples demonstrate authentication using a personal access token.

Note

As a security best practice when you authenticate with automated tools, systems, scripts, and apps, Databricks recommends that you use OAuth tokens or personal access tokens belonging to service principals instead of workspace users. To create tokens for service principals, see Manage tokens for a service principal.

Given a MODEL_VERSION_URI like https://<databricks-instance>/model/iris-classifier/Production/invocations, where <databricks-instance> is the name of your Databricks instance, and a Databricks REST API token called DATABRICKS_API_TOKEN, the following are example snippets of how to score a served model.

Score a model accepting dataframe records input format.

curl -X POST -u token:$DATABRICKS_API_TOKEN $MODEL_VERSION_URI \
  -H 'Content-Type: application/json' \
  -d '{"dataframe_records": [
    {
      "sepal_length": 5.1,
      "sepal_width": 3.5,
      "petal_length": 1.4,
      "petal_width": 0.2
    }
  ]}'

Score a model accepting tensor inputs. Tensor inputs should be formatted as described in TensorFlow Serving’s API docs.

curl -X POST -u token:$DATABRICKS_API_TOKEN $MODEL_VERSION_URI \
   -H 'Content-Type: application/json' \
   -d '{"inputs": [[5.1, 3.5, 1.4, 0.2]]}'
import numpy as np
import pandas as pd
import requests

def create_tf_serving_json(data):
  return {'inputs': {name: data[name].tolist() for name in data.keys()} if isinstance(data, dict) else data.tolist()}

def score_model(model_uri, databricks_token, data):
  headers = {
    "Authorization": f"Bearer {databricks_token}",
    "Content-Type": "application/json",
  }
  data_json = json.dumps({'dataframe_records': data.to_dict(orient='records')}) if isinstance(data, pd.DataFrame) else create_tf_serving_json(data)
  response = requests.request(method='POST', headers=headers, url=model_uri, json=data_json)
  if response.status_code != 200:
      raise Exception(f"Request failed with status {response.status_code}, {response.text}")
  return response.json()


# Scoring a model that accepts pandas DataFrames
data =  pd.DataFrame([{
  "sepal_length": 5.1,
  "sepal_width": 3.5,
  "petal_length": 1.4,
  "petal_width": 0.2
}])
score_model(MODEL_VERSION_URI, DATABRICKS_API_TOKEN, data)

# Scoring a model that accepts tensors
data = np.asarray([[5.1, 3.5, 1.4, 0.2]])
score_model(MODEL_VERSION_URI, DATABRICKS_API_TOKEN, data)

You can score a dataset in Power BI Desktop using the following steps:

  1. Open dataset you want to score.

  2. Go to Transform Data.

  3. Right-click in the left panel and select Create New Query.

  4. Go to View > Advanced Editor.

  5. Replace the query body with the code snippet below, after filling in an appropriate DATABRICKS_API_TOKEN and MODEL_VERSION_URI.

    (dataset as table ) as table =>
    let
      call_predict = (dataset as table ) as list =>
      let
        apiToken = DATABRICKS_API_TOKEN,
        modelUri = MODEL_VERSION_URI,
        responseList = Json.Document(Web.Contents(modelUri,
          [
            Headers = [
              #"Content-Type" = "application/json",
              #"Authorization" = Text.Format("Bearer #{0}", {apiToken})
            ],
            Content = {"dataframe_records": Json.FromValue(dataset)}
          ]
        ))
      in
        responseList,
      predictionList = List.Combine(List.Transform(Table.Split(dataset, 256), (x) => call_predict(x))),
      predictionsTable = Table.FromList(predictionList, (x) => {x}, {"Prediction"}),
      datasetWithPrediction = Table.Join(
        Table.AddIndexColumn(predictionsTable, "index"), "index",
        Table.AddIndexColumn(dataset, "index"), "index")
    in
      datasetWithPrediction
    
  6. Name the query with your desired model name.

  7. Open the advanced query editor for your dataset and apply the model function.

Notebook example

See the following notebook for an example of how to test your Model Serving endpoint with a Python model:

Test Model Serving endpoint notebook

Open notebook in new tab