Batch inference using Foundation Model APIs
This article provides example notebooks that perform batch inference on a provisioned throughput endpoint using Foundation Model APIs. You need both notebooks to accomplish batch inference using Foundation Model APIs.
The examples demonstrate batch inference using the DBRX Instruct model for chat tasks.
Requirements
A workspace in a Foundation Model APIs supported region
Databricks Runtime 14.0 ML or above
The
provisioned-throughput-batch-inference
notebook andchat-batch-inference-api
notebook must exist in the same directory in the workspace
Set up input table, batch inference
The following notebook does the following tasks, using Python:
Reads data from the input table and input column
Constructs the requests and sends them to a Foundation Model APIs endpoint
Persists input rows together with the response data to the output table
The following notebook does the same tasks as the above notebook, but using Spark:
Reads data from the input table and input column
Constructs the requests and sends them to a Foundation Model APIs endpoint
Persists input row together with the response data to the output table
Create provisioned throughput endpoint
If you want to use the spark notebook instead of the python notebook, be sure to update the command that calls the Python notebook.
Creates a provisioned throughput serving endpoint
Monitor the endpoint until it achieves a ready state
Calls the
chat-batch-inference-api
notebook to run batch inference tasks concurrently against the prepared endpoint. If you prefer to use Spark, change this reference to call thechat-batch-inference-udf
notebook.Deletes the provisioned throughput serving endpoint after batch inference completes