Batch inference using Foundation Model APIs provisioned throughput
This article provides an example notebook that performs batch inference on a provisioned throughput endpoint using Foundation Model APIs and ai_query.
Requirements
A workspace in a Foundation Model APIs supported region.
One of the following:
All-purpose compute with compute size
i3.2xlarge
or larger running Databricks Runtime 15.4 ML LTS or above with at least 2 workers.SQL warehouse medium and larger.
Run batch inference
Generally, setting up batch inference involves 2 steps:
Creating the endpoint to be used for batch inference.
Constructing the batch requests and sending those requests to the batch inference endpoint using
ai_query
.
The example notebook covers these steps and demonstrates batch inference using the Meta Llama 3.1 70B model.