Mosaic AI Gateway introduction

This article describes Mosaic AI Gateway, the Databricks solution for governing and monitoring access to supported generative AI models and their associated model serving endpoints.

What is Mosaic AI Gateway?

Mosaic AI Gateway is designed to streamline the usage and management of generative AI models and agents within an organization. It is a centralized service that brings governance, monitoring, and production readiness to model serving endpoints. It also allows you to run, secure, and govern AI traffic to democratize and accelerate AI adoption for your organization.

All data is logged into Delta tables in Unity Catalog.

To start visualizing insights from your AI Gateway data, download the example AI Gateway dashboard from GitHub. This dashboard leverages the data from the usage tracking and payload logging inference tables.

After you download the JSON file, import the dashboard into your workspace. For instructions on importing dashboards, see Import a dashboard file.

Supported features

The following table defines the available AI Gateway features and which model serving endpoint types support them.

Feature	Definition	External model endpoint	Foundation Model APIs provisioned throughput endpoint	Foundation Model APIs pay-per-token endpoint	Mosaic AI agents	Custom model endpoint
Permission and rate limiting	Control who has access and how much access.	Supported	Supported	Supported	Not supported	Supported
Payload logging	Monitor and audit data being sent to model APIs using inference tables.	Supported	Supported	Supported	Supported	Supported
Usage tracking	Monitor operational usage on endpoints and associated costs using system tables.	Supported	Supported	Supported	Not supported	Supported
AI Guardrails	Prevent unwanted and unsafe data in requests and responses. See AI Guardrails.	Supported	Supported	Supported	Not supported	Not supported
Fallbacks	Minimize production outages during and after deployment.	Supported	Not supported	Not supported	Not supported	Not supported
Traffic splitting	Load balance traffic across models.	Supported	Supported	Not supported	Not supported	Supported

Mosaic AI Gateway incurs charges on an enabled feature basis. Paid features include payload logging and usage tracking. Features such as query permissions, rate limiting, fallbacks, and traffic splitting are free of charge. Any new features are subject to charge.

AI Guardrails

Preview

This feature is in Public Preview.

AI Guardrails allow users to configure and enforce data compliance at the model serving endpoint level and to reduce harmful content on any requests sent to the underlying model. Bad requests and responses are blocked and a default message is returned to the user. See how to configure guardrails on a model serving endpoint.

important

The AI Guardrails moderation service has a dependency on Foundation Model APIs pay-per-token models. This dependency limits the availability of the AI Guardrails moderation service to regions that support Foundation Model APIs pay-per-token. Regions that require cross-geo enablement to use Foundation Model APIs pay-per-token do not support AI Guardrails.

The following table summarizes the configurable guardrails. See Limitations.

Guardrail	Definition
Safety filtering	Safety filtering prevents your model from interacting with unsafe and harmful content, like violent crime, self-harm, and hate speech. AI Gateway safety filter is built with Meta Llama 3. Databricks uses Llama Guard 2-8b as the safety filter. To learn more about the Llama Guard safety filter and what topics apply to the safety filter, see the Meta Llama Guard 2 8B model card. Meta Llama 3 is licensed under the LLAMA 3 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved. Customers are responsible for ensuring compliance with applicable model licenses.
Personally identifiable information (PII) detection	Customers can detect any sensitive information such as credit card numbers for users. For this feature, AI Gateway uses Presidio to detect the following U.S. categories of PII: credit card numbers, email addresses, phone numbers, bank account numbers, and social security numbers. The PII classifier can help identify sensitive information or PII in structured and unstructured data. However, because it is using automated detection mechanisms, there is no guarantee that the service will find all sensitive information. Consequently, additional systems and protections should be employed. These classification methods are primarily scoped to U.S. categories of PII, such as U.S. phone numbers, and social security numbers.

Guardrail

Definition

Safety filtering

Safety filtering prevents your model from interacting with unsafe and harmful content, like violent crime, self-harm, and hate speech.

AI Gateway safety filter is built with Meta Llama 3. Databricks uses Llama Guard 2-8b as the safety filter. To learn more about the Llama Guard safety filter and what topics apply to the safety filter, see the Meta Llama Guard 2 8B model card.

Personally identifiable information (PII) detection

Customers can detect any sensitive information such as credit card numbers for users.

For this feature, AI Gateway uses Presidio to detect the following U.S. categories of PII: credit card numbers, email addresses, phone numbers, bank account numbers, and social security numbers.

The PII classifier can help identify sensitive information or PII in structured and unstructured data. However, because it is using automated detection mechanisms, there is no guarantee that the service will find all sensitive information. Consequently, additional systems and protections should be employed.

These classification methods are primarily scoped to U.S. categories of PII, such as U.S. phone numbers, and social security numbers.

Use AI Gateway

You can configure AI Gateway features on your model serving endpoints using the Serving UI. See Configure AI Gateway on model serving endpoints.

Limitations

The following are limitations for AI Gateway-enabled endpoints:

When AI guardrails are used, the request batch size, that is an embeddings batch size, completions batch size, or the n parameter of chat requests, can not exceed 16.
If you use function calling and specify AI guardrails, those guardrails are not applied to the requests and intermediate responses of the function. However, guardrails are applied to the final output response.
Text-to-image workloads are not supported.
Only usage tracking is supported for batch inference workloads on pay-per-token endpoints that have AI Gateway features enabled. In the endpoint_usage system table only the rows corresponding to the batch inference request are visible.
AI guardrails and fallbacks are not supported on custom model serving endpoints.
For custom model serving endpoints, only workloads that are not route-optimized support rate limiting and usage tracking.
Inference tables for route-optimized model serving endpoints are in Public Preview.
See AI Gateway-enabled inference table limitations for details on inference table limitations.

What is Mosaic AI Gateway?​

Supported features​

AI Guardrails​

Use AI Gateway​

Limitations​

What is Mosaic AI Gateway?

Supported features

AI Guardrails

Use AI Gateway

Limitations