Model Serving limits and regions
This article summarizes the limitations and region availability for Mosaic AI Model Serving and supported endpoint types.
Resource and payload limits
Mosaic AI Model Serving imposes default limits to ensure reliable performance. If you have feedback on these limits, reach out to your Databricks account team.
The following table summarizes resource and payload limitations for model serving endpoints.
Feature | Granularity | Limit |
---|---|---|
Payload size | Per request | 16 MB. For endpoints serving foundation models, external models, or AI agents the limit is 4 MB. |
Request/response size | Per request | Any request/response over 1 MB will not be logged. |
Queries per second (QPS) | Per workspace | 200. For higher QPS, enable route optimization. |
Model execution duration | Per request | 120 seconds |
CPU endpoint model memory usage | Per endpoint | 4GB |
GPU endpoint model memory usage | Per endpoint | Greater than or equal to assigned GPU memory, depends on the GPU workload size |
Provisioned concurrency | Per model and per workspace | 200 concurrency. Can be increased by reaching out to your Databricks account team. |
Overhead latency | Per request | Less than 50 milliseconds |
Init scripts | Init scripts are not supported. | |
Foundation Model APIs rate limits | Per workspace | See Foundation Model APIs rate limits and quotas for detailed information about pay-per-token and provisioned throughput limits. |
Networking and security limitations
- Model Serving endpoints are protected by access control and respect networking-related ingress rules configured on the workspace, like IP allowlists and PrivateLink.
- By default, Model Serving does not support PrivateLink to external endpoints. Support for this functionality is evaluated and implemented on a per-region basis. Reach out to your Databricks account team for more information.
- Model Serving does not provide security patches to existing model images because of the risk of destabilization to production deployments. A new model image created from a new model version will contain the latest patches. Reach out to your Databricks account team for more information.
- You can restrict outbound network access from Model Serving endpoints by configuring network policies. See Manage network policies for serverless egress control.
Compliance security profile standards: CPU and GPU workloads
The following table lists the region availability and supported compliance security profile compliance standards for model serving on CPU and GPU workloads including external models.
These compliance standards require served containers to be built in the most recent 30 days. Databricks automatically rebuilds outdated containers on your behalf. However, if this automated job fails, an event log message like the following appears and provides guidance on how to ensure your endpoints stay within compliance requirements:
"Databricks couldn't complete a scheduled compliance check for model $servedModelName. This can happen if the system can't apply a required update. To resolve, try relogging your model. If the issue persists, contact support@databricks.com."
Region | Location | HIPAA | PCI-DSS | FedRAMP Moderate | IRAP | CCCS Medium (Protected B) | UK Cyber Essentials Plus |
---|---|---|---|---|---|---|---|
| Asia Pacific (Tokyo) | ✓ | ✓ |
|
|
|
|
| Asia Pacific (Seoul) | ✓ | ✓ |
|
|
|
|
| Asia Pacific (Mumbai) | ✓ | ✓ |
|
|
|
|
| Asia Pacific (Singapore) | ✓ | ✓ |
|
|
|
|
| Asia Pacific (Sydney) | ✓ | ✓ |
| ✓ |
|
|
| Canada (Central) | ✓ | ✓ |
|
| ✓ |
|
| EU (Frankfurt) | ✓ | ✓ |
|
|
|
|
| EU (Ireland) | ✓ | ✓ |
|
|
|
|
| EU (London) | ✓ | ✓ |
|
|
| ✓ |
| EU (Paris) |
|
|
|
|
|
|
| South America (Sao Paulo) | ✓ | ✓ |
|
|
|
|
| US East (Northern Virginia) | ✓ | ✓ | ✓ |
|
|
|
| US East (Ohio) | ✓ | ✓ | ✓ |
|
|
|
| US Gov West (Pendleton) |
|
|
|
|
|
|
| US West (Northern California) |
|
|
|
|
|
|
| US West (Oregon) | ✓ | ✓ | ✓ |
|
|
|
Compliance security profile standards: Foundation Model APIs workloads
The table lists the supported compliance security profile compliance standards for the following Foundation Model APIs workloads:
- Provisioned throughput
- Pay-per-token
- Batch inference using AI Functions and Databricks-hosted models
These compliance standards require served containers to be built in the most recent 30 days. Databricks automatically rebuilds outdated containers on your behalf. However, if this automated job fails, an event log message like the following appears and provides guidance on how to ensure your endpoints stay within compliance requirements:
"Databricks couldn't complete a scheduled compliance check for model $servedModelName. This can happen if the system can't apply a required update. To resolve, try relogging your model. If the issue persists, contact support@databricks.com."
Region | Location | HIPAA | PCI-DSS | FedRAMP Moderate | IRAP | CCCS Medium (Protected B) | UK Cyber Essentials Plus |
---|---|---|---|---|---|---|---|
| Asia Pacific (Tokyo) | ✓ | ✓ |
|
|
|
|
| Asia Pacific (Seoul) | ✓ | ✓ |
|
|
|
|
| Asia Pacific (Mumbai) | ✓ | ✓ |
|
|
|
|
| Asia Pacific (Singapore) | ✓ | ✓ |
|
|
|
|
| Asia Pacific (Sydney) | ✓ | ✓ |
| ✓ |
|
|
| Canada (Central) | ✓ | ✓ |
|
| ✓ |
|
| EU (Frankfurt) | ✓ | ✓ |
|
|
|
|
| EU (Ireland) | ✓ | ✓ |
|
|
|
|
| EU (London) | ✓ | ✓ |
|
|
| ✓* |
| EU (Paris) |
|
|
|
|
|
|
| South America (Sao Paulo) | ✓ | ✓ |
|
|
|
|
| US East (Northern Virginia) | ✓ | ✓ | ✓ |
|
|
|
| US East (Ohio) | ✓ | ✓ | ✓ |
|
|
|
| US Gov West (Pendleton) |
|
|
|
|
|
|
| US West (Northern California) |
|
|
|
|
|
|
| US West (Oregon) | ✓ | ✓ | ✓ |
|
|
|
* Some models require cross geography routing for provisioned throughput and therefore are not UK Cyber Essentials Plus compliant. Reach out to your Databricks account team for more information.
Foundation Model APIs limits
For detailed information about Foundation Model APIs, see:
- Rate limits and quotas: Foundation Model APIs rate limits and quotas - Includes TPM limits, regional availability, and model-specific restrictions
- Compliance and security: Foundation Model APIs compliance and security - Covers compliance standards, data processing, and security requirements
Region availability
If you require an endpoint in an unsupported region, reach out to your Databricks account team.
If your workspace is deployed in a region that supports model serving but is served by a control plane in an unsupported region, the workspace does not support model serving. If you attempt to use model serving in such a workspace, you will see in an error message stating that your workspace is not supported. Reach out to your Databricks account team for more information.
See Model serving features availability for more information on regional availability of each Model Serving feature.
For Databricks-hosted foundation model region availability, see Foundation models hosted on Databricks.