External models in Mosaic AI Model Serving

Important

The code examples in this article demonstrate usage of the Public Preview MLflow Deployments CRUD API.

This article describes external models in Mosaic AI Model Serving including its supported model providers and limitations.

What are external models?

Important

You can now configure Mosaic AI Gateway on model serving endpoints that serve external models. AI Gateway brings governance, monitoring, and production readiness to these model serving endpoints. See Mosaic AI Gateway.

External models are third-party models hosted outside of Databricks. Supported by Model Serving, external models allow you to streamline the usage and management of various large language model (LLM) providers, such as OpenAI and Anthropic, within an organization. You can also use Mosaic AI Model Serving as a provider to serve custom models, which offers rate limits for those endpoints. As part of this support, Model Serving offers a high-level interface that simplifies the interaction with these services by providing a unified endpoint to handle specific LLM-related requests.

In addition, Databricks support for external models provides centralized credential management. By storing API keys in one secure location, organizations can enhance their security posture by minimizing the exposure of sensitive API keys throughout the system. It also helps to prevent exposing these keys within code or requiring end users to manage keys safely.

See Tutorial: Create external model endpoints to query OpenAI models for step-by-step guidance on external model endpoint creation and querying supported models served by those endpoints using the MLflow Deployments SDK. See the following guides for instructions on how to use the Serving UI and the REST API:

Requirements

Model providers

External models in Model Serving is designed to support a variety of model providers. A provider represents the source of the machine learning models, such as OpenAI, Anthropic, and so on. Each provider has its specific characteristics and configurations that are encapsulated within the external_model field of the external model endpoint configuration.

The following providers are supported:

  • openai: For models offered by OpenAI and the Azure integrations for Azure OpenAI and Azure OpenAI with AAD.

  • anthropic: For models offered by Anthropic.

  • cohere: For models offered by Cohere.

  • amazon-bedrock: For models offered by Amazon Bedrock.

  • google-cloud-vertex-ai: For models offered by Google Cloud Vertex AI.

  • databricks-model-serving: For Mosaic AI Model Serving endpoints with compatible schemas. See Endpoint configuration.

To request support for a provider not listed here, reach out to your Databricks account team.

Supported models

The model you choose directly affects the results of the responses you get from the API calls. Therefore, choose a model that fits your use-case requirements. For instance, for generating conversational responses, you can choose a chat model. Conversely, for generating embeddings of text, you can choose an embedding model.

The table below presents a non-exhaustive list of supported models and corresponding endpoint types. Model associations listed below can be used as a helpful guide when configuring an endpoint for any newly released model types as they become available with a given provider. Customers are responsible for ensuring compliance with applicable model licenses.

Note

With the rapid development of LLMs, there is no guarantee that this list is up to date at all times.

Model provider

llm/v1/completions

llm/v1/chat

llm/v1/embeddings

OpenAI**

  • gpt-3.5-turbo-instruct

  • babbage-002

  • davinci-002

  • gpt-3.5-turbo

  • gpt-4

  • gpt-4o

  • gpt-4o-2024-05-13

  • gpt-4o-mini

  • gpt-3.5-turbo-0125

  • gpt-3.5-turbo-1106

  • gpt-4-0125-preview

  • gpt-4-turbo-preview

  • gpt-4-1106-preview

  • gpt-4-vision-preview

  • gpt-4-1106-vision-preview

  • text-embedding-ada-002

  • text-embedding-3-large

  • text-embedding-3-small

Azure OpenAI**

  • text-davinci-003

  • gpt-35-turbo-instruct

  • gpt-35-turbo

  • gpt-35-turbo-16k

  • gpt-4

  • gpt-4-32k

  • gpt-4o

  • gpt-4o-mini

  • text-embedding-ada-002

  • text-embedding-3-large

  • text-embedding-3-small

Anthropic

  • claude-1

  • claude-1.3-100k

  • claude-2

  • claude-2.1

  • claude-2.0

  • claude-instant-1.2

  • claude-3-5-sonnet-20240620

  • claude-3-haiku-20240307

  • claude-3-opus-20240229

  • claude-3-sonnet-20240229

  • claude-2.1

  • claude-2.0

  • claude-instant-1.2

Cohere**

  • command

  • command-light

  • command-r-plus

  • command-r

  • command

  • command-light-nightly

  • command-light

  • command-nightly

  • embed-english-v2.0

  • embed-multilingual-v2.0

  • embed-english-light-v2.0

  • embed-english-v3.0

  • embed-english-light-v3.0

  • embed-multilingual-v3.0

  • embed-multilingual-light-v3.0

Mosaic AI Model Serving

Databricks serving endpoint

Databricks serving endpoint

Databricks serving endpoint

Amazon Bedrock

Anthropic:

  • claude-instant-v1

  • claude-v2

Cohere:

  • command-text-v14

  • command-light-text-v14

AI21 Labs:

  • j2-grande-instruct

  • j2-jumbo-instruct

  • j2-mid

  • j2-mid-v1

  • j2-ultra

  • j2-ultra-v1

Anthropic:

  • claude-v2

  • claude-v2:1

  • claude-3-sonnet-20240229-v1:0

  • claude-3-5-sonnet-20240620-v1:0

Cohere:

  • command-r-plus-v1:0

  • command-r-v1:0

Amazon:

  • titan-embed-text-v1

  • titan-embed-g1-text-02

Cohere:

  • embed-english-v3

  • embed-multilingual-v3

AI21 Labs†

  • j2-mid

  • j2-light

  • j2-ultra

Google Cloud Vertex AI

text-bison

  • chat-bison

  • gemini-pro

  • gemini-1.0-pro

  • gemini-1.5-pro

  • gemini-1.5-flash

textembedding-gecko

** Model provider supports fine-tuned completion and chat models. To query a fine-tuned model, populate the name field of the external model configuration with the name of your fine-tuned model.

† Model provider supports custom completion models.

Use models served on Mosaic AI Model Serving endpoints

Mosaic AI Model Serving endpoints as a provider is supported for the llm/v1/completions, llm/v1/chat, and llm/v1/embeddings endpoint types. These endpoints must accept the standard query parameters marked as required, while other parameters might be ignored depending on whether or not the Mosaic AI Model Serving endpoint supports them.

See POST /serving-endpoints/{name}/invocations in the API reference for standard query parameters.

These endpoints must produce responses in the following OpenAI format.

For completions tasks:

{
"id": "123", # Not Required
"model": "test_databricks_model",
"choices": [
  {
    "text": "Hello World!",
    "index": 0,
    "logprobs": null, # Not Required
    "finish_reason": "length" # Not Required
  }
],
"usage": {
  "prompt_tokens": 8,
  "total_tokens": 8
  }
}

For chat tasks:

{
  "id": "123", # Not Required
  "model": "test_chat_model",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "\n\nHello there, how may I assist you today?",
    },
    "finish_reason": "stop"
  },
  {
    "index": 1,
    "message": {
      "role": "human",
      "content": "\n\nWhat is the weather in San Francisco?",
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 8,
    "total_tokens": 8
  }
}

For embeddings tasks:

{
  "data": [
    {
      "embedding": [
        0.0023064255,
        -0.009327292,
        .... # (1536 floats total for ada-002)
        -0.0028842222,
      ],
      "index": 0
    },
    {
      "embedding": [
        0.0023064255,
        -0.009327292,
        .... #(1536 floats total for ada-002)
        -0.0028842222,
      ],
      "index": 0
    }
  ],
  "model": "test_embedding_model",
  "usage": {
    "prompt_tokens": 8,
    "total_tokens": 8
  }
}

Endpoint configuration

To serve and query external models you need to configure a serving endpoint. See Create an external model serving endpoint

For an external model serving endpoint, you must include the external_model field and its parameters in the served_entities section of the endpoint configuration. If you configure multiple external models in a serving endpoint, you must provide a traffic_config to define the traffic routing percentage for each external model.

The external_model field defines the model to which this endpoint forwards requests. When specifying a model, it is critical that the provider supports the model you are requesting. For instance, openai as a provider supports models like text-embedding-ada-002, but other providers might not. If the model is not supported by the provider, Databricks returns an HTTP 4xx error when trying to route requests to that model.

The below table summarizes the external_model field parameters. See POST /api/2.0/serving-endpoints for endpoint configuration parameters.

Parameter

Descriptions

name

The name of the model to use. For example, gpt-3.5-turbo for OpenAI’s GPT-3.5-Turbo model.

provider

Specifies the name of the provider for this model. This string value must correspond to a supported external model provider. For example, openai for OpenAI’s GPT-3.5 models.

task

The task corresponds to the type of language model interaction you desire. Supported tasks are “llm/v1/completions”, “llm/v1/chat”, “llm/v1/embeddings”.

<provider>_config

Contains any additional configuration details required for the model. This includes specifying the API base URL and the API key. See Configure the provider for an endpoint.

The following is an example of creating an external model endpoint using the create_endpoint() API. In this example, a request sent to the completion endpoint is forwarded to the claude-2 model provided by anthropic.

import mlflow.deployments

client = mlflow.deployments.get_deploy_client("databricks")

client.create_endpoint(
    name="anthropic-completions-endpoint",
    config={
        "served_entities": [
            {
                "name": "test",
                "external_model": {
                    "name": "claude-2",
                    "provider": "anthropic",
                    "task": "llm/v1/completions",
                    "anthropic_config": {
                        "anthropic_api_key": "{{secrets/my_anthropic_secret_scope/anthropic_api_key}}"
                    }
                }
            }
        ]
    }
)

Configure the provider for an endpoint

When you create an endpoint, you must supply the required configurations for the specified model provider. The following sections summarize the available endpoint configuration parameters for each model provider.

Note

Databricks encrypts and securely stores the provided credentials for each model provider. These credentials are automatically deleted when their associated endpoints are deleted.

OpenAI

Configuration Parameter

Description

Required

Default

openai_api_key

The Databricks secret key reference for an OpenAI API key using the OpenAI service. If you prefer to paste your API key directly, see openai_api_key_plaintext.

You must provide an API key using one of the following fields: openai_api_key or openai_api_key_plaintext.

openai_api_key_plaintext

The OpenAI API key using the OpenAI service provided as a plaintext string. If you prefer to reference your key using Databricks Secrets, see openai_api_key.

You must provide an API key using one of the following fields: openai_api_key or openai_api_key_plaintext must be provided.

openai_api_type

An optional field to specify the type of OpenAI API to use.

No

openai

openai_api_base

The base URL for the OpenAI API.

No

https://api.openai.com/v1

openai_api_version

An optional field to specify the OpenAI API version.

No

openai_organization

An optional field to specify the organization in OpenAI.

No

Cohere

Configuration Parameter

Description

Required

Default

cohere_api_key

The Databricks secret key reference for a Cohere API key. If you prefer to paste your API key directly, see cohere_api_key_plaintext.

You must provide an API key using one of the following fields: cohere_api_key or cohere_api_key_plaintext.

cohere_api_key_plaintext

The Cohere API key provided as a plaintext string. If you prefer to reference your key using Databricks Secrets, see cohere_api_key.

You must provide an API key using one of the following fields: cohere_api_key or cohere_api_key_plaintext.

cohere_api_base

The base URL for the Cohere service.

No

Anthropic

Configuration Parameter

Description

Required

Default

anthropic_api_key

The Databricks secret key reference for an Anthropic API key. If you prefer to paste your API key directly, see anthropic_api_key_plaintext.

You must provide an API key using one of the following fields: anthropic_api_key or anthropic_api_key_plaintext.

anthropic_api_key_plaintext

The Anthropic API key provided as a plaintext string. If you prefer to reference your key using Databricks Secrets, see anthropic_api_key.

You must provide an API key using one of the following fields: anthropic_api_key or anthropic_api_key_plaintext.

Azure OpenAI

Azure OpenAI has distinct features as compared with the direct OpenAI service. For an overview, please see the comparison documentation.

Configuration Parameter

Description

Required

Default

openai_api_key

The Databricks secret key reference for an OpenAI API key using the Azure service. If you prefer to paste your API key directly, see openai_api_key_plaintext.

You must provide an API key using one of the following fields: openai_api_key or openai_api_key_plaintext.

openai_api_key_plaintext

The OpenAI API key using the Azure service provided as a plaintext string. If you prefer to reference your key using Databricks Secrets, see openai_api_key.

You must provide an API key using one of the following fields: openai_api_key or openai_api_key_plaintext.

openai_api_type

Use azure for access token validation.

Yes

openai_api_base

The base URL for the Azure OpenAI API service provided by Azure.

Yes

openai_api_version

The version of the Azure OpenAI service to utilize, specified by a date.

Yes

openai_deployment_name

The name of the deployment resource for the Azure OpenAI service.

Yes

openai_organization

An optional field to specify the organization in OpenAI.

No

If you are using Azure OpenAI with Microsoft Entra ID, use the following parameters in your endpoint configuration.

Configuration Parameter

Description

Required

Default

microsoft_entra_tenant_id

The tenant ID for Microsoft Entra ID authentication.

Yes

microsoft_entra_client_id

The client ID for Microsoft Entra ID authentication.

Yes

microsoft_entra_client_secret

The Databricks secret key reference for a client secret used for Microsoft Entra ID authentication. If you prefer to paste your client secret directly, see microsoft_entra_client_secret_plaintext.

You must provide an API key using one of the following fields: microsoft_entra_client_secret or microsoft_entra_client_secret_plaintext.

microsoft_entra_client_secret_plaintext

The client secret used for Microsoft Entra ID authentication provided as a plaintext string. If you prefer to reference your key using Databricks Secrets, see microsoft_entra_client_secret.

You must provide an API key using one of the following fields: microsoft_entra_client_secret or microsoft_entra_client_secret_plaintext.

openai_api_type

Use azuread for authentication using Microsoft Entra ID.

Yes

openai_api_base

The base URL for the Azure OpenAI API service provided by Azure.

Yes

openai_api_version

The version of the Azure OpenAI service to utilize, specified by a date.

Yes

openai_deployment_name

The name of the deployment resource for the Azure OpenAI service.

Yes

openai_organization

An optional field to specify the organization in OpenAI.

No

The following example demonstrates how to create an endpoint with Azure OpenAI:

client.create_endpoint(
    name="openai-chat-endpoint",
    config={
        "served_entities": [{
            "external_model": {
                "name": "gpt-3.5-turbo",
                "provider": "openai",
                "task": "llm/v1/chat",
                "openai_config": {
                    "openai_api_type": "azure",
                    "openai_api_key": "{{secrets/my_openai_secret_scope/openai_api_key}}",
                    "openai_api_base": "https://my-azure-openai-endpoint.openai.azure.com",
                    "openai_deployment_name": "my-gpt-35-turbo-deployment",
                    "openai_api_version": "2023-05-15"
                }
            }
        }]
    }
)

Google Cloud Vertex AI

Configuration Parameter

Description

Required

Default

private_key

The Databricks secret key reference for a private key for the service account which has access to the Google Cloud Vertex AI Service. See Best practices for managing service account keys. If you prefer to paste your API key directly, see private_key_plaintext.

You must provide an API key using one of the following fields: private_key or private_key_plaintext.

private_key_plaintext

The private key for the service account which has access to the Google Cloud Vertex AI Service provided as a plaintext secret. See Best practices for managing service account keys. If you prefer to reference your key using Databricks Secrets, see private_key.

You must provide an API key using one of the following fields: private_key or private_key_plaintext.

region

This is the region for the Google Cloud Vertex AI Service. See supported regions for more details. Some models are only available in specific regions.

Yes

project_id

This is the Google Cloud project id that the service account is associated with.

Yes

Amazon Bedrock

To use Amazon Bedrock as an external model provider, customers need to make sure Bedrock is enabled in the specified AWS region, and the specified AWS key pair have the appropriate permissions to interact with Bedrock services. For more information, see AWS Identity and Access Management.

Configuration Parameter

Description

Required

Default

aws_region

The AWS region to use. Bedrock has to be enabled there.

Yes

instance_profile_arn

Amazon Resource Name (ARN) of the instance profile that the served entity uses to access AWS resources.

You must authenticate using an instance profile or access keys. If you prefer to use access keys, see aws_access_key_id and aws_secret_access_key.

aws_access_key_id

The Databricks secret key reference for an AWS access key ID with permissions to interact with Bedrock services. If you prefer to paste your API key directly, see aws_access_key_id.

You must authenticate using an instance profile or access keys. If you choose to use access keys, you must provide an API key using one of the following fields: aws_access_key_id or aws_access_key_id_plaintext.

aws_access_key_id_plaintext

An AWS access key ID with permissions to interact with Bedrock services provided as a plaintext string. If you prefer to reference your key using Databricks Secrets, see aws_access_key_id.

You must authenticate using an instance profile or access keys. If you choose to use access keys, you must provide an API key using one of the following fields: aws_access_key_id or aws_access_key_id_plaintext.

aws_secret_access_key

The Databricks secret key reference for an AWS secret access key paired with the access key ID, with permissions to interact with Bedrock services. If you prefer to paste your API key directly, see aws_secret_access_key_plaintext.

You must authenticate using an instance profile or access keys. If you choose to use access keys, you must provide an API key using one of the following fields: aws_secret_access_key or aws_secret_access_key_plaintext.

aws_secret_access_key_plaintext

An AWS secret access key paired with the access key ID, with permissions to interact with Bedrock services provided as a plaintext string. If you prefer to reference your key using Databricks Secrets, see aws_secret_access_key.

You must authenticate using an instance profile or access keys. If you choose to use access keys, you must provide an API key using one of the following fields: aws_secret_access_key or aws_secret_access_key_plaintext.

bedrock_provider

The underlying provider in Amazon Bedrock. Supported values (case insensitive) include: Anthropic, Cohere, AI21Labs, Amazon

Yes

The following example demonstrates how to create an endpoint with Amazon Bedrock using an instance profile. If you prefer to use access keys, use aws_access_key_id and aws_secret_access_key.

client.create_endpoint(
    name="bedrock-anthropic-completions-endpoint",
    config={
        "served_entities": [
            {
                "external_model": {
                    "name": "claude-v2",
                    "provider": "amazon-bedrock",
                    "task": "llm/v1/completions",
                    "amazon_bedrock_config": {
                        "aws_region": "<YOUR_AWS_REGION>",
                        "instance_profile_arn": "<YOUR_AWS_INSTANCE_PROFILE_ARN>", ## Remove if using access keys
                        # "aws_access_key_id": "{{secrets/my_amazon_bedrock_secret_scope/aws_access_key_id}}",
                        # "aws_secret_access_key": "{{secrets/my_amazon_bedrock_secret_scope/aws_secret_access_key}}",
                        "bedrock_provider": "anthropic",
                    },
                }
            }
        ]
    },
)

If there are AWS permission issues, Databricks recommends that you verify the credentials directly with the Amazon Bedrock API.

AI21 Labs

Configuration Parameter

Description

Required

Default

ai21labs_api_key

The Databricks secret key reference for an AI21 Labs API key. If you prefer to paste your API key directly, see ai21labs_api_key_plaintext.

You must provide an API key using one of the following fields: ai21labs_api_key or ai21labs_api_key_plaintext.

ai21labs_api_key_plaintext

An AI21 Labs API key provided as a plaintext string. If you prefer to reference your key using Databricks Secrets, see ai21labs_api_key.

You must provide an API key using one of the following fields: ai21labs_api_key or ai21labs_api_key_plaintext.

Configure AI Gateway on an endpoint

You can also configure your endpoint to enable Mosaic AI Gateway features, such as rate limiting, usage tracking and guardrails.

See Configure AI Gateway on model serving endpoints.

Query an external model endpoint

After you create an external model endpoint, it is ready to receive traffic from users.

You can send scoring requests to the endpoint using the OpenAI client, the REST API or the MLflow Deployments SDK.

The following example queries the claude-2 completions model hosted by Anthropic using the OpenAI client. To use the OpenAI client, populate the model field with the name of the model serving endpoint that hosts the model you want to query.

This example uses a previously created endpoint, anthropic-completions-endpoint, configured for accessing external models from the Anthropic model provider. See how to create external model endpoints.

See Supported models for additional models you can query and their providers.

import os
import openai
from openai import OpenAI

client = OpenAI(
    api_key="dapi-your-databricks-token",
    base_url="https://example.staging.cloud.databricks.com/serving-endpoints"
)

completion = client.completions.create(
  model="anthropic-completions-endpoint",
  prompt="what is databricks",
  temperature=1.0
)
print(completion)

Expected output response format:

{
"id": "123", # Not Required
"model": "anthropic-completions-endpoint",
"choices": [
  {
    "text": "Hello World!",
    "index": 0,
    "logprobs": null, # Not Required
    "finish_reason": "length" # Not Required
  }
],
"usage": {
  "prompt_tokens": 8,
  "total_tokens": 8
  }
}

Additional query parameters

You can pass any additional parameters supported by the endpoint’s provider as part of your query.

For example:

  • logit_bias (supported by OpenAI, Cohere).

  • top_k (supported by Anthropic, Cohere).

  • frequency_penalty (supported by OpenAI, Cohere).

  • presence_penalty (supported by OpenAI, Cohere).

  • stream (supported by OpenAI, Anthropic, Cohere, Amazon Bedrock for Anthropic). This is only available for chat and completions requests.

Limitations

Depending on the external model you choose, your configuration might cause your data to be processed outside of the region where your data originated. See Model Serving limits and regions.