External models in Mosaic AI Model Serving

important

The code examples in this article demonstrate usage of the Public Preview MLflow Deployments CRUD API.

This article describes external models in Mosaic AI Model Serving including its supported model providers and limitations.

What are external models?

important

You can now configure Mosaic AI Gateway on model serving endpoints that serve external models. AI Gateway brings governance, monitoring, and production readiness to these model serving endpoints. See AI Gateway.

External models are third-party models hosted outside of Databricks. Supported by Model Serving, external models allow you to streamline the usage and management of various large language model (LLM) providers, such as OpenAI and Anthropic, within an organization. You can also use Mosaic AI Model Serving as a provider to serve custom models, which offers rate limits for those endpoints. As part of this support, Model Serving offers a high-level interface that simplifies the interaction with these services by providing a unified endpoint to handle specific LLM-related requests.

In addition, Databricks support for external models provides centralized credential management. By storing API keys in one secure location, organizations can enhance their security posture by minimizing the exposure of sensitive API keys throughout the system. It also helps to prevent exposing these keys within code or requiring end users to manage keys safely.

See Tutorial: Create external model endpoints to query OpenAI models for step-by-step guidance on external model endpoint creation and querying supported models served by those endpoints using the MLflow Deployments SDK. See the following guides for instructions on how to use the Serving UI and the REST API:

Requirements

API key or authentication fields for the model provider.
Databricks workspace in External models supported regions.

Model providers

External models in Model Serving is designed to support a variety of model providers. A provider represents the source of the machine learning models, such as OpenAI, Anthropic, and so on. Each provider has its specific characteristics and configurations that are encapsulated within the external_model field of the external model endpoint configuration.

The following providers are supported:

openai: For models offered by OpenAI and the Azure integrations for Azure OpenAI and Azure OpenAI with AAD.
anthropic: For models offered by Anthropic.
cohere: For models offered by Cohere.
amazon-bedrock: For models offered by Amazon Bedrock.
google-cloud-vertex-ai: For models offered by Google Cloud Vertex AI.
databricks-model-serving: For Mosaic AI Model Serving endpoints with compatible schemas. See Endpoint configuration.
custom: For alternate providers or models behind custom proxies that are OpenAI API compatible, but not directly supported by Databricks.

To request support for a provider not listed here, try using the custom provider option or reach out to your Databricks account team.

Supported models

The model you choose directly affects the results of the responses you get from the API calls. Therefore, choose a model that fits your use-case requirements. For instance, for generating conversational responses, you can choose a chat model. Conversely, for generating embeddings of text, you can choose an embedding model.

See supported models.

Use models served on Mosaic AI Model Serving endpoints

Mosaic AI Model Serving endpoints as a provider is supported for the llm/v1/completions, llm/v1/chat, and llm/v1/embeddings endpoint types. These endpoints must accept the standard query parameters marked as required, while other parameters might be ignored depending on whether or not the Mosaic AI Model Serving endpoint supports them.

See POST /serving-endpoints/{name}/invocations in the API reference for standard query parameters.

These endpoints must produce responses in the following OpenAI format.

For completions tasks:

Python
{
"id": "123", # Not Required
"model": "test_databricks_model",
"choices": [
  {
    "text": "Hello World!",
    "index": 0,
    "logprobs": null, # Not Required
    "finish_reason": "length" # Not Required
  }
],
"usage": {
  "prompt_tokens": 8,
  "total_tokens": 8
  }
}

For chat tasks:

Python
{
  "id": "123", # Not Required
  "model": "test_chat_model",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "\n\nHello there, how may I assist you today?",
    },
    "finish_reason": "stop"
  },
  {
    "index": 1,
    "message": {
      "role": "human",
      "content": "\n\nWhat is the weather in San Francisco?",
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 8,
    "total_tokens": 8
  }
}

For embeddings tasks:

Python
{
  "data": [
    {
      "embedding": [
        0.0023064255,
        -0.009327292,
        .... # (1536 floats total for ada-002)
        -0.0028842222,
      ],
      "index": 0
    },
    {
      "embedding": [
        0.0023064255,
        -0.009327292,
        .... #(1536 floats total for ada-002)
        -0.0028842222,
      ],
      "index": 0
    }
  ],
  "model": "test_embedding_model",
  "usage": {
    "prompt_tokens": 8,
    "total_tokens": 8
  }
}

Endpoint configuration

To serve and query external models you need to configure a serving endpoint. See Create an external model serving endpoint

For an external model serving endpoint, you must include the external_model field and its parameters in the served_entities section of the endpoint configuration. If you configure multiple external models in a serving endpoint, you must provide a traffic_config to define the traffic routing percentage for each external model.

The external_model field defines the model to which this endpoint forwards requests. When specifying a model, it is critical that the provider supports the model you are requesting. For instance, openai as a provider supports models like text-embedding-ada-002, but other providers might not. If the model is not supported by the provider, Databricks returns an HTTP 4xx error when trying to route requests to that model.

The below table summarizes the external_model field parameters. See POST /api/2.0/serving-endpoints for endpoint configuration parameters.

Parameter	Descriptions
`name`	The name of the model to use. For example, `gpt-3.5-turbo` for OpenAI's `GPT-3.5-Turbo` model. This is passed in as part of the request body with the corresponding key: `"model"`.
`provider`	Specifies the name of the provider for this model. This string value must correspond to a supported external model provider. For example, `openai` for OpenAI's `GPT-3.5` models.
`task`	The task corresponds to the type of language model interaction you desire. Supported tasks are “llm/v1/completions”, “llm/v1/chat”, “llm/v1/embeddings”.
`<provider>_config`	Contains any additional configuration details required for the model. This includes specifying the API base URL and the API key. See Configure the provider for an endpoint. If you are using `custom` provider, specify this parameter as `custom_provider_config`.

The following is an example of creating an external model endpoint using the create_endpoint() API. In this example, a request sent to the completion endpoint is forwarded to the claude-2 model provided by anthropic.

Python
import mlflow.deployments

client = mlflow.deployments.get_deploy_client("databricks")

client.create_endpoint(
    name="anthropic-completions-endpoint",
    config={
        "served_entities": [
            {
                "name": "test",
                "external_model": {
                    "name": "claude-2",
                    "provider": "anthropic",
                    "task": "llm/v1/completions",
                    "anthropic_config": {
                        "anthropic_api_key": "{{secrets/my_anthropic_secret_scope/anthropic_api_key}}"
                    }
                }
            }
        ]
    }
)

Configure the provider for an endpoint

When you create an endpoint, you must supply the required configurations for the specified model provider. The following sections summarize the available endpoint configuration parameters for each model provider.

note

Databricks encrypts and securely stores the provided credentials for each model provider. These credentials are automatically deleted when their associated endpoints are deleted.

OpenAI

Configuration Parameter	Description	Required	Default
`openai_api_key`	The Databricks secret key reference for an OpenAI API key using the OpenAI service. If you prefer to paste your API key directly, see `openai_api_key_plaintext`.	You must provide an API key using one of the following fields: `openai_api_key` or `openai_api_key_plaintext`.
`openai_api_key_plaintext`	The OpenAI API key using the OpenAI service provided as a plaintext string. If you prefer to reference your key using Databricks Secrets, see `openai_api_key`.	You must provide an API key using one of the following fields: `openai_api_key` or `openai_api_key_plaintext` must be provided.
`openai_api_type`	An optional field to specify the type of OpenAI API to use.	No	`openai`
`openai_api_base`	The base URL for the OpenAI API.	No	`https://api.openai.com/v1`
`openai_api_version`	An optional field to specify the OpenAI API version.	No
`openai_organization`	An optional field to specify the organization in OpenAI.	No

Cohere

Configuration Parameter	Description	Required
`cohere_api_key`	The Databricks secret key reference for a Cohere API key. If you prefer to paste your API key directly, see `cohere_api_key_plaintext`.	You must provide an API key using one of the following fields: `cohere_api_key` or `cohere_api_key_plaintext`.
`cohere_api_key_plaintext`	The Cohere API key provided as a plaintext string. If you prefer to reference your key using Databricks Secrets, see `cohere_api_key`.	You must provide an API key using one of the following fields: `cohere_api_key` or `cohere_api_key_plaintext`.
`cohere_api_base`	The base URL for the Cohere service.	No

Anthropic

Configuration Parameter	Description	Required	Default
`anthropic_api_key`	The Databricks secret key reference for an Anthropic API key. If you prefer to paste your API key directly, see `anthropic_api_key_plaintext`.	You must provide an API key using one of the following fields: `anthropic_api_key` or `anthropic_api_key_plaintext`.
`anthropic_api_key_plaintext`	The Anthropic API key provided as a plaintext string. If you prefer to reference your key using Databricks Secrets, see `anthropic_api_key`.	You must provide an API key using one of the following fields: `anthropic_api_key` or `anthropic_api_key_plaintext`.

Azure OpenAI

Azure OpenAI has distinct features as compared with the direct OpenAI service. For an overview, please see the comparison documentation.

Configuration Parameter	Description	Required
`openai_api_key`	The Databricks secret key reference for an OpenAI API key using the Azure service. If you prefer to paste your API key directly, see `openai_api_key_plaintext`.	You must provide an API key using one of the following fields: `openai_api_key` or `openai_api_key_plaintext`.
`openai_api_key_plaintext`	The OpenAI API key using the Azure service provided as a plaintext string. If you prefer to reference your key using Databricks Secrets, see `openai_api_key`.	You must provide an API key using one of the following fields: `openai_api_key` or `openai_api_key_plaintext`.
`openai_api_type`	Use `azure` for access token validation.	Yes
`openai_api_base`	The base URL for the Azure OpenAI API service provided by Azure.	Yes
`openai_api_version`	The version of the Azure OpenAI service to utilize, specified by a date.	Yes
`openai_deployment_name`	The name of the deployment resource for the Azure OpenAI service.	Yes
`openai_organization`	An optional field to specify the organization in OpenAI.	No

If you are using Azure OpenAI with Microsoft Entra ID, use the following parameters in your endpoint configuration. Databricks passes https://cognitiveservices.azure.com/ as the default scope for the Microsoft Entra ID token.

Configuration Parameter	Description	Required
`microsoft_entra_tenant_id`	The tenant ID for Microsoft Entra ID authentication.	Yes
`microsoft_entra_client_id`	The client ID for Microsoft Entra ID authentication.	Yes
`microsoft_entra_client_secret`	The Databricks secret key reference for a client secret used for Microsoft Entra ID authentication. If you prefer to paste your client secret directly, see `microsoft_entra_client_secret_plaintext`.	You must provide an API key using one of the following fields: `microsoft_entra_client_secret` or `microsoft_entra_client_secret_plaintext`.
`microsoft_entra_client_secret_plaintext`	The client secret used for Microsoft Entra ID authentication provided as a plaintext string. If you prefer to reference your key using Databricks Secrets, see `microsoft_entra_client_secret`.	You must provide an API key using one of the following fields: `microsoft_entra_client_secret` or `microsoft_entra_client_secret_plaintext`.
`openai_api_type`	Use `azuread` for authentication using Microsoft Entra ID.	Yes
`openai_api_base`	The base URL for the Azure OpenAI API service provided by Azure.	Yes
`openai_api_version`	The version of the Azure OpenAI service to utilize, specified by a date.	Yes
`openai_deployment_name`	The name of the deployment resource for the Azure OpenAI service.	Yes
`openai_organization`	An optional field to specify the organization in OpenAI.	No

The following example demonstrates how to create an endpoint with Azure OpenAI:

Python
client.create_endpoint(
    name="openai-chat-endpoint",
    config={
        "served_entities": [{
            "external_model": {
                "name": "gpt-3.5-turbo",
                "provider": "openai",
                "task": "llm/v1/chat",
                "openai_config": {
                    "openai_api_type": "azure",
                    "openai_api_key": "{{secrets/my_openai_secret_scope/openai_api_key}}",
                    "openai_api_base": "https://my-azure-openai-endpoint.openai.azure.com",
                    "openai_deployment_name": "my-gpt-35-turbo-deployment",
                    "openai_api_version": "2023-05-15"
                }
            }
        }]
    }
)

Google Cloud Vertex AI

Configuration Parameter	Description	Required
`private_key`	The Databricks secret key reference for a private key for the service account which has access to the Google Cloud Vertex AI Service. See Best practices for managing service account keys. If you prefer to paste your API key directly, see `private_key_plaintext`.	You must provide an API key using one of the following fields: `private_key` or `private_key_plaintext`.
`private_key_plaintext`	The private key for the service account which has access to the Google Cloud Vertex AI Service provided as a plaintext secret. See Best practices for managing service account keys. If you prefer to reference your key using Databricks Secrets, see `private_key`.	You must provide an API key using one of the following fields: `private_key` or `private_key_plaintext`.
`region`	This is the region for the Google Cloud Vertex AI Service. See supported regions for more details. Some models are only available in specific regions.	Yes
`project_id`	This is the Google Cloud project id that the service account is associated with.	Yes

Amazon Bedrock

To use Amazon Bedrock as an external model provider, customers need to make sure Bedrock is enabled in the specified AWS region, and the specified AWS key pair have the appropriate permissions to interact with Bedrock services. For more information, see AWS Identity and Access Management.

Amazon Bedrock supports multiple authentication mechanisms. Exactly one of the following must be provided:

A Unity Catalog service credential that references an AWS IAM role
An AWS instance profile
AWS access keys (either using Databricks Secrets or plaintext)

If access keys are used, both an access key ID and secret access key are required.

Configuration parameter	Description	Required
`aws_region`	The AWS region to use. Amazon Bedrock must be enabled in this region.	Yes
`uc_service_credential_name`	Reference to a Unity Catalog service credential used by the served entity to access AWS resources. The credential must reference an AWS IAM role.	No
`instance_profile_arn`	Amazon Resource Name (ARN) of the instance profile used by the served entity to access AWS resources.	No
`aws_access_key_id`	AWS access key ID used by the served entity to access AWS resources, provided using a Databricks secret reference.	No
`aws_access_key_id_plaintext`	AWS access key ID used by the served entity to access AWS resources, provided as a plaintext string.	No
`aws_secret_access_key`	AWS secret access key used by the served entity to access AWS resources, paired with the access key ID and provided using a Databricks secret reference.	No
`aws_secret_access_key_plaintext`	AWS secret access key used by the served entity to access AWS resources, paired with the access key ID and provided as a plaintext string.	No
`bedrock_provider`	The underlying provider in Amazon Bedrock. Supported values (case insensitive): Anthropic, Cohere, AI21Labs, Amazon.	Yes

The following example demonstrates how to create an endpoint with Amazon Bedrock using an instance profile. If you prefer to use access keys, use aws_access_key_id and aws_secret_access_key. If you prefer to use a service credential, use uc_service_credential_name.

Python
client.create_endpoint(
    name="bedrock-anthropic-completions-endpoint",
    config={
        "served_entities": [
            {
                "external_model": {
                    "name": "claude-v2",
                    "provider": "amazon-bedrock",
                    "task": "llm/v1/completions",
                    "amazon_bedrock_config": {
                        "aws_region": "<YOUR_AWS_REGION>",
                        "uc_service_credential_name": "<YOUR_UC_SERVICE_CREDENTIAL_NAME>", ## Remove if using other authentication methods
                        # "instance_profile_arn": "<YOUR_AWS_INSTANCE_PROFILE_ARN>",
                        # "aws_access_key_id": "{{secrets/my_amazon_bedrock_secret_scope/aws_access_key_id}}",
                        # "aws_secret_access_key": "{{secrets/my_amazon_bedrock_secret_scope/aws_secret_access_key}}",
                        "bedrock_provider": "anthropic",
                    },
                }
            }
        ]
    },
)

note

To use existing Amazon Bedrock guardrails with Amazon Bedrock models through AI Gateway, you may set the X-Amzn-Bedrock-GuardrailIdentifier header to the guardrail ARN of your choosing, and set X-Amzn-Bedrock-GuardrailVersion as necessary. These headers are forwarded to Amazon Bedrock.

If there are AWS permission issues, Databricks recommends that you verify the credentials directly with the Amazon Bedrock API.

AI21 Labs

Configuration Parameter	Description	Required	Default
`ai21labs_api_key`	The Databricks secret key reference for an AI21 Labs API key. If you prefer to paste your API key directly, see `ai21labs_api_key_plaintext`.	You must provide an API key using one of the following fields: `ai21labs_api_key` or `ai21labs_api_key_plaintext`.
`ai21labs_api_key_plaintext`	An AI21 Labs API key provided as a plaintext string. If you prefer to reference your key using Databricks Secrets, see `ai21labs_api_key`.	You must provide an API key using one of the following fields: `ai21labs_api_key` or `ai21labs_api_key_plaintext`.

Custom provider

important

To use a custom provider model it must be OpenAI API compatible.

Configuration Parameter	Description	Required
`custom_provider_url`	The URL for where the custom provider's model lives. The URL must point to a specific API endpoint; for example, `https://api.provider.com/chat/completions`.	Yes
`bearer_token_auth`	If the custom provider utilizes bearer token authentication, specify the required fields.	You must provide an authentication method using one of the following fields: `bearer_token_auth` or `api_key_auth`.
`token`	The Databricks secret key reference for the token for bearer authentication. This parameter must be nested under `bearer_token_auth`. If you prefer to paste your API key directly, see `token_plaintext`.	If using bearer authentication, you must provide an API key using one of the following fields: `token` or `token_plaintext`.
`token_plaintext`	The token for bearer authentication provided as a plaintext string. This parameter must be nested under `bearer_token_auth`. If you prefer to reference your key using Databricks secrets, see `token`.	If using bearer authentication, you must provide an API key using one of the following fields: `token` or `token_plaintext`.
`api_key_auth`	If the custom provider utilizes API key authentication, specify the required fields.	You must provide an authentication method using one of the following fields: `bearer_token_auth` or `api_key_auth`.
`key`	The key for API key authentication. This parameter must be nested under `api_key_auth`	Yes, when using API key authentication.
`value`	The Databricks secret key reference for the value for API key authentication. If you prefer to paste your API key directly, see `value_plaintext`.	If using API key authentication, you must provide an API key using one of the following fields: `value` or `value_plaintext`.
`value_plaintext`	The value for API key authentication provided as a plaintext string. If you prefer to reference your key using Databricks secrets, see `value`.	If using API key authentication, you must provide an API key using one of the following fields: `value` or `value_plaintext`.

The following example demonstrates how to create an endpoint with a custom provider using bearer authentication:

Python
client.create_endpoint(
    name="custom-provider-completions-endpoint",
    config={
        "served_entities": [
            {
                "external_model": {
                    "name": "custom-provider-model",
                    "provider": "custom",
                    "task": "llm/v1/chat",
                    "custom_provider_config": {
                        "custom_provider_url": "https://api.provider.com/chat/completions",
                        "bearer_token_auth": {
                            "token": "{{secrets/my_custom_provider_secret_scope/custom_provider_token}}"
                        }
                    }
                }
            }
        ]
    },
)

The following example demonstrates how to create an endpoint with a custom provider using API key authentication:

Python
client.create_endpoint(
    name="custom-provider-completions-endpoint",
    config={
        "served_entities": [
            {
                "external_model": {
                    "name": "custom-provider-model",
                    "provider": "custom",
                    "task": "llm/v1/chat",
                    "custom_provider_config": {
                        "custom_provider_url": "https://my-custom-provider.com",
                        "api_key_auth": {
                            "key": "X-API-KEY",
                            "value": "{{secrets/my_custom_provider_secret_scope/custom_provider_api_key}}"
                        }
                    }
                }
            }
        ]
    },
)

Configure AI Gateway on an endpoint

You can also configure your endpoint to enable Mosaic AI Gateway features, such as rate limiting, usage tracking and guardrails.

See Configure AI Gateway on model serving endpoints.

Query an external model endpoint

After you create an external model endpoint, it is ready to receive traffic from users.

You can send scoring requests to the endpoint using the OpenAI client, the REST API or the MLflow Deployments SDK.

See the standard query parameters for a scoring request in POST /serving-endpoints/{name}/invocations.
Use foundation models

The following example queries the claude-2 completions model hosted by Anthropic using the OpenAI client. To use the OpenAI client, populate the model field with the name of the model serving endpoint that hosts the model you want to query.

This example uses a previously created endpoint, anthropic-completions-endpoint, configured for accessing external models from the Anthropic model provider. See how to create external model endpoints.

See Supported models for additional models you can query and their providers.

Python
import os
import openai
from openai import OpenAI

client = OpenAI(
    api_key="dapi-your-databricks-token",
    base_url="https://example.staging.cloud.databricks.com/serving-endpoints"
)

completion = client.completions.create(
  model="anthropic-completions-endpoint",
  prompt="what is databricks",
  temperature=1.0
)
print(completion)

Expected output response format:

Python
{
"id": "123", # Not Required
"model": "anthropic-completions-endpoint",
"choices": [
  {
    "text": "Hello World!",
    "index": 0,
    "logprobs": null, # Not Required
    "finish_reason": "length" # Not Required
  }
],
"usage": {
  "prompt_tokens": 8,
  "total_tokens": 8
  }
}

Additional query parameters

You can pass any additional parameters supported by the endpoint's provider as part of your query.

For example:

logit_bias (supported by OpenAI, Cohere).
top_k (supported by Anthropic, Cohere).
frequency_penalty (supported by OpenAI, Cohere).
presence_penalty (supported by OpenAI, Cohere).
stream (supported by OpenAI, Anthropic, Cohere, Amazon Bedrock for Anthropic). This is only available for chat and completions requests.

tools (supported by OpenAI, Anthropic, Amazon Bedrock for Anthropic). This is only available for chat and completions requests. This parameter allows the integration of external functionalities, including Computer Use (beta) for Anthropic and Amazon Bedrock for Anthropic. See Function calling on Databricks.

Network connectivity configurations support for external models

Support for Network connectivity configurations (NCCs) for external models, including AWS PrivateLink, is currently in Public preview. Reach out to your Databricks account team to participate in the preview.

Limitations

Depending on the external model you choose, your configuration might cause your data to be processed outside of the region where your data originated. See Model Serving limits and regions.

What are external models?​

Requirements​

Model providers​

Supported models​

Use models served on Mosaic AI Model Serving endpoints​

Endpoint configuration​

Configure the provider for an endpoint​

OpenAI​

Cohere​

Anthropic​

Azure OpenAI​

Google Cloud Vertex AI​

Amazon Bedrock​

AI21 Labs​

Custom provider​

Configure AI Gateway on an endpoint​

Query an external model endpoint​

Additional query parameters​

Network connectivity configurations support for external models​

Limitations​

Additional resources​

What are external models?

Requirements

Model providers

Supported models

Use models served on Mosaic AI Model Serving endpoints

Endpoint configuration

Configure the provider for an endpoint

OpenAI

Cohere

Anthropic

Azure OpenAI

Google Cloud Vertex AI

Amazon Bedrock

AI21 Labs

Custom provider

Configure AI Gateway on an endpoint

Query an external model endpoint

Additional query parameters

Network connectivity configurations support for external models

Limitations

Additional resources