Batch inference using Foundation Model APIs provisioned throughput

This article provides an example notebook that performs batch inference on a provisioned throughput endpoint using Foundation Model APIs and ai_query.

Requirements

  • A workspace in a Foundation Model APIs supported region.

  • One of the following:

    • All-purpose compute with compute size i3.2xlarge or larger running Databricks Runtime 15.4 ML LTS or above with at least 2 workers.

    • SQL warehouse medium and larger.

Run batch inference

Generally, setting up batch inference involves 2 steps:

  1. Creating the endpoint to be used for batch inference.

  2. Constructing the batch requests and sending those requests to the batch inference endpoint using ai_query.

The example notebook covers these steps and demonstrates batch inference using the Meta Llama 3.1 70B model.

Batch inference with a provisioned throughput endpoint notebook

Open notebook in new tab