Batch inference using Foundation Model APIs

This article provides example notebooks that perform batch inference on a provisioned throughput endpoint using Foundation Model APIs. You need both notebooks to accomplish batch inference using Foundation Model APIs.

The examples demonstrate batch inference using the DBRX Instruct model for chat tasks.

Requirements

A workspace in a Foundation Model APIs supported region
Databricks Runtime 14.0 ML or above
The provisioned-throughput-batch-inference notebook and chat-batch-inference-api notebook must exist in the same directory in the workspace

Set up input table, batch inference

The following notebook does the following tasks, using Python:

Reads data from the input table and input column
Constructs the requests and sends them to a Foundation Model APIs endpoint
Persists input rows together with the response data to the output table

Chat model batch inference tasks using Python notebook

Open notebook in new tab

The following notebook does the same tasks as the above notebook, but using Spark:

Reads data from the input table and input column
Constructs the requests and sends them to a Foundation Model APIs endpoint
Persists input row together with the response data to the output table

Chat model batch inference tasks using PySpark Pandas UDF notebook

Open notebook in new tab

Create provisioned throughput endpoint

If you want to use the spark notebook instead of the python notebook, be sure to update the command that calls the Python notebook.

Creates a provisioned throughput serving endpoint
Monitor the endpoint until it achieves a ready state
Calls the chat-batch-inference-api notebook to run batch inference tasks concurrently against the prepared endpoint. If you prefer to use Spark, change this reference to call the chat-batch-inference-udf notebook.
Deletes the provisioned throughput serving endpoint after batch inference completes

Perform batch inference on a provisioned throughput endpoint notebook

Open notebook in new tab

Batch inference using Foundation Model APIs

Requirements

Set up input table, batch inference

Chat model batch inference tasks using Python notebook

Chat model batch inference tasks using PySpark Pandas UDF notebook

Create provisioned throughput endpoint

Perform batch inference on a provisioned throughput endpoint notebook

Additional resources