prov-throughput-deepseek-r1-distill-llama(Python)

Loading...

Serve DeepSeek R1 (Distilled Llama 70B) using provisioned throughput

This notebook demonstrates how to download and register the DeepSeek R1 distilled Llama model in Unity Catalog and deploy it using a Foundation Model APIs provisioned throughput endpoint.

Install the transformers library from HuggingFace

Install HuggingFace transformers

Download DeepSeek R1 distilled Llama 70B

The following code downloads the DeepSeek R1 distilled Llama 70B model to your local machine.

6

Set the huggingface cache folder on the local SSD drive.

Download first the checkpoint to deploy

Register the downloaded model to Unity Catalog

The following code shows how to start and log a run that registers the downloaded model to Unity Catalog.

Register your downloaded model in Unity Catalog

Create a provisioned throughput endpoint for model serving

The following code shows how to create a provisioned throughput model serving endpoint to serve the Llama 70B that you downloaded and registered to Unity Catalog.

12