External models in Databricks Model Serving

Important

The code examples in this article demonstrate usage of the Public Preview MLflow Deployments CRUD API.

This article describes external models in Databricks Model Serving including its supported model providers and limitations.

What are external models?

External models are third-party models hosted outside of Databricks. Supported by Model Serving, external models allow you to streamline the usage and management of various large language model (LLM) providers, such as OpenAI and Anthropic, within an organization. You can also use Databricks Model Serving as a provider to serve custom models, which offers rate limits for those endpoints. As part of this support, Model Serving offers a high-level interface that simplifies the interaction with these services by providing a unified endpoint to handle specific LLM-related requests.

In addition, Databricks support for external models provides centralized credential management. By storing API keys in one secure location, organizations can enhance their security posture by minimizing the exposure of sensitive API keys throughout the system. It also helps to prevent exposing these keys within code or requiring end users to manage keys safely.

See Tutorial: Create external model endpoints to query OpenAI models for step-by-step guidance on external model endpoint creation and querying supported models served by those endpoints using the MLflow Deployments SDK. See the following guides for instructions on how to use the Serving UI and the REST API:

Requirements

Model providers

External models in Model Serving is designed to support a variety of model providers. A provider represents the source of the machine learning models, such as OpenAI, Anthropic, and so on. Each provider has its specific characteristics and configurations that are encapsulated within the external_model field of the external model endpoint configuration.

The following providers are supported:

  • openai: For models offered by OpenAI and the Azure integrations for Azure OpenAI and Azure OpenAI with AAD.

  • anthropic: For models offered by Anthropic.

  • cohere: For models offered by Cohere.

  • aws-bedrock: For models offered by Amazon Bedrock.

  • palm: For models offered by PaLM.

  • ai21labs: For models offered by AI21Labs.

  • databricks-model-serving: For Databricks Model Serving endpoints with compatible schemas. See Endpoint configuration.

To request support for a provider not listed here, reach out to your Databricks account team.

Supported models

The model you choose directly affects the results of the responses you get from the API calls. Therefore, choose a model that fits your use-case requirements. For instance, for generating conversational responses, you can choose a chat model. Conversely, for generating embeddings of text, you can choose an embedding model.

The table below presents a non-exhaustive list of supported models and corresponding endpoint types. Model associations listed below can be used as a helpful guide when configuring an endpoint for any newly released model types as they become available with a given provider. Customers are responsible for ensuring compliance with applicable model licenses.

Note

With the rapid development of LLMs, there is no guarantee that this list is up to date at all times.

Endpoint type

OpenAI

Anthropic

Cohere

Azure OpenAI

MLflow

Amazon Bedrock

AI21 Labs

PaLM

llm/v1/completions

gpt-3.5-turbo-instruct, babbage-002, davinci-002

claude-1, claude-1.3-100k, claude-2

command, command-light-nightly

text-davinci-003, gpt-35-turbo-instruct

MLflow served models

Anthropic: claude-instant-v1, claude-v1, claude-v2

Cohere: command-text-v14

AI21 Labs: j2-grande-instruct, j2-jumbo-instruct, j2-mid, j2-mid-v1, j2-ultra, j2-ultra-v1

j2-mid, j2-light, j2-ultra

text-bison-001

llm/v1/chat

gpt-3.5-turbo, gpt-4

gpt-35-turbo gpt-4

MLflow served models

chat-bison-001

llm/v1/embeddings

text-embedding-ada-002

embed-english-v2.0, embed-multilingual-v2.0

text-embedding-ada-002

MLflow served models

Amazon: titan-embed-text-v1

embedding-gecko-001

Use models served on Databricks Model Serving endpoints

Databricks Model Serving endpoints as a provider is supported for the llm/v1/completions, llm/v1/chat, and llm/v1/embeddings endpoint types. These endpoints must accept the standard query parameters marked as required, while other parameters might be ignored depending on whether or not the Databricks Model Serving endpoint supports them.

See POST /serving-endpoints/{name}/invocations in the API reference for standard query parameters.

These endpoints must produce responses in the following OpenAI format.

For completions tasks:

{
"id": "123", # Not Required
"model": "test_databricks_model",
"choices": [
  {
    "text": "Hello World!",
    "index": 0,
    "logprobs": null, # Not Required
    "finish_reason": "length" # Not Required
  }
],
"usage": {
  "prompt_tokens": 8,
  "total_tokens": 8
  }
}

For chat tasks:

{
  "id": "123", # Not Required
  "model": "test_chat_model",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "\n\nHello there, how may I assist you today?",
    },
    "finish_reason": "stop"
  },
  {
    "index": 1,
    "message": {
      "role": "human",
      "content": "\n\nWhat is the weather in San Francisco?",
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 8,
    "total_tokens": 8
  }
}

For embeddings tasks:

{
  "data": [
    {
      "embedding": [
       0.0023064255,
        -0.009327292,
        .... # (1536 floats total for ada-002)
        -0.0028842222,
      ],
      "index": 0
    },
    {
      "embedding": [
        0.0023064255,
        -0.009327292,
        .... #(1536 floats total for ada-002)
        -0.0028842222,
      ],
      "index": 0
    }
  ],
  "model": "test_embedding_model",
  "usage": {
    "prompt_tokens": 8,
    "total_tokens": 8
  }
}

Endpoint configuration

To serve and query external models you need to configure a serving endpoint. See Create and configure model serving endpoints

For an external model serving endpoint you must include the external_model field and its parameters in the served_entities section of the endpoint configuration.

The external_model field defines the model to which this endpoint forwards requests. When specifying a model, it is critical that the provider supports the model you are requesting. For instance, openai as a provider supports models like text-embedding-ada-002, but other providers might not. If the model is not supported by the provider, Databricks returns an HTTP 4xx error when trying to route requests to that model.

The below table summarizes the external_model field parameters. See POST /api/2.0/serving-endpoints for endpoint configuration parameters.

Parameter

Descriptions

name

The name of the model to use. For example, gpt-3.5-turbo for OpenAI’s GPT-3.5-Turbo model.

provider

Specifies the name of the provider for this model. This string value must correspond to a supported external model provider. For example, openai for OpenAI’s GPT-3.5 models.

task

The task corresponds to the type of language model interaction you desire. Supported tasks are “llm/v1/completions”, “llm/v1/chat”, “llm/v1/embeddings”.

<provider>_config

Contains any additional configuration details required for the model. This includes specifying the API base URL and the API key. See Configure the provider for an endpoint.

The following is an example of creating an external model endpoint using the create_endpoint() API. In this example, a request sent to the completion endpoint is forwarded to the claude-2 model provided by anthropic.

import mlflow.deployments

client = mlflow.deployments.get_deploy_client("databricks")

client.create_endpoint(
    name="anthropic-completions-endpoint",
    config={
        "served_entities": [
            {
                "name": "test",
                "external_model": {
                    "name": "claude-2",
                    "provider": "anthropic",
                    "task": "llm/v1/completions",
                    "anthropic_config": {
                        "anthropic_api_key": "{{secrets/my_anthropic_secret_scope/anthropic_api_key}}"
                    }
                }
            }
        ]
    }
)

Configure the provider for an endpoint

When you create an endpoint, you must supply the required configurations for the specified model provider. The following sections summarize the available endpoint configuration parameters for each model provider.

OpenAI

Configuration Parameter

Description

Required

Default

openai_api_key

The API key for the OpenAI service.

Yes

openai_api_type

An optional field to specify the type of OpenAI API to use.

No

openai_api_base

The base URL for the OpenAI API.

No

https://api.openai.com/v1

openai_api_version

An optional field to specify the OpenAI API version.

No

openai_organization

An optional field to specify the organization in OpenAI.

No

Cohere

Configuration Parameter

Description

Required

Default

cohere_api_key

The API key for the Cohere service.

Yes

Anthropic

Configuration Parameter

Description

Required

Default

anthropic_api_key

The API key for the Anthropic service.

Yes

Azure OpenAI

Azure OpenAI has distinct features as compared with the direct OpenAI service. For an overview, please see the comparison documentation.

Configuration Parameter

Description

Required

Default

openai_api_key

The API key for the Azure OpenAI service.

Yes

openai_api_type

Adjust this parameter to represent the preferred security access validation protocol. For access token validation, use azure. For authentication using Azure Active Directory (Azure AD) use, azuread.

Yes

openai_api_base

The base URL for the Azure OpenAI API service provided by Azure.

Yes

openai_api_version

The version of the Azure OpenAI service to utilize, specified by a date.

Yes

openai_deployment_name

The name of the deployment resource for the Azure OpenAI service.

Yes

openai_organization

An optional field to specify the organization in OpenAI.

No

The following example demonstrates how to create an endpoint with Azure OpenAI:

client.create_endpoint(
    name="openai-chat-endpoint",
    config={
        "served_entities": [
            "external_model": {
                "name": "gpt-3.5-turbo",
                "provider": "openai",
                "task": "llm/v1/chat",
                "openai_config": {
                    "openai_api_type": "azure",
                    "openai_api_key": "{{secrets/my_openai_secret_scope/openai_api_key}}",
                    "openai_api_base": "https://my-azure-openai-endpoint.openai.azure.com",
                    "openai_deployment_name": "my-gpt-35-turbo-deployment",
                    "openai_api_version": "2023-05-15"
                }
            }
        ]
    }
)

Amazon Bedrock

To use Amazon Bedrock as an external model provider, customers need to make sure Bedrock is enabled in the specified AWS region, and the specified AWS key pair have the appropriate permissions to interact with Bedrock services. For more information, see AWS Identity and Access Management.

Configuration Parameter

Description

Required

Default

aws_region

The AWS region to use. Bedrock has to be enabled there.

Yes

aws_access_key_id

An AWS access key ID with permissions to interact with Bedrock services.

Yes

aws_secret_access_key

An AWS secret access key paired with the access key ID, with permissions to interact with Bedrock services.

Yes

bedrock_provider

The underlying provider in Amazon Bedrock. Supported values (case insensitive) include: Anthropic, Cohere, AI21Labs, Amazon

Yes

The following example demonstrates how to create an endpoint with Amazon Bedrock.

client.create_endpoint(
    name="bedrock-anthropic-completions-endpoint",
    config={
        "served_entities": [
            {
                "external_model": {
                    "name": "claude-v2",
                    "provider": "aws-bedrock",
                    "task": "llm/v1/completions",
                    "aws_bedrock_config": {
                        "aws_region": "<YOUR_AWS_REGION>",
                        "aws_access_key_id": "{{secrets/my_aws_bedrock_secret_scope/aws_access_key_id}}",
                        "aws_secret_access_key": "{{secrets/my_aws_bedrock_secret_scope/aws_secret_access_key}}",
                        "bedrock_provider": "anthropic",
                    },
                }
            }
        ]
    },
)

If there are AWS permission issues, Databricks recommends that you verify the credentials directly with the Amazon Bedrock API.

AI21 Labs

Configuration Parameter

Description

Required

Default

ai21labs_api_key

This is the API key for the AI21 Labs service.

Yes

PaLM

Configuration Parameter

Description

Required

Default

palm_api_key

This is the API key for the PaLM service.

Yes

Query an external model endpoint

After you create an external model endpoint, it is ready to receive traffic from users.

You can send scoring requests to the endpoint using the OpenAI client, the REST API or the MLflow Deployments SDK.

The following example queries the claude-2 completions model hosted by Anthropic using the OpenAI client. To use the OpenAI client, populate the model field with the name of the model serving endpoint that hosts the model you want to query.

This example uses a previously created endpoint, anthropic-completions-endpoint, configured for accessing external models from the Anthropic model provider. See how to create external model endpoints.

See Supported models for additional models you can query and their providers.

import os
import openai
from openai import OpenAI

client = OpenAI(
    api_key="dapi-your-databricks-token",
    base_url="https://example.staging.cloud.databricks.com/serving-endpoints"
)

completion = client.completions.create(
  model="anthropic-completions-endpoint",
  prompt="what is databricks",
  temperature=1.0
)
print(completion)

Expected output response format:

{
"id": "123", # Not Required
"model": "anthropic-completions-endpoint",
"choices": [
  {
    "text": "Hello World!",
    "index": 0,
    "logprobs": null, # Not Required
    "finish_reason": "length" # Not Required
  }
],
"usage": {
  "prompt_tokens": 8,
  "total_tokens": 8
  }
}

Additional query parameters

You can pass any additional parameters supported by the endpoint’s provider as part of your query.

For example:

  • logit_bias (supported by OpenAI, Cohere).

  • top_k (supported by Anthropic, Cohere).

  • frequency_penalty (supported by OpenAI, Cohere).

  • presence_penalty (supported by OpenAI, Cohere).

  • stream (supported by OpenAI, Anthropic, Cohere, Amazon Bedrock for Anthropic). This is only available for chat and completions requests.

Limitations

Depending on the external model you choose, your configuration might cause your data to be processed outside of the region where your data originated.

See Model Serving limits and regions.