SQL warehouse sizing, scaling, and queuing behavior

This article explains how to size, scale, and manage query queues for Databricks SQL warehouses to optimize performance and cost. Databricks recommends using a serverless SQL warehouse for most workloads. Serverless SQL warehouses deliver the best performance and efficiency by dynamically managing resources for your queries.

Serverless SQL warehouse management

Serverless SQL warehouses use Intelligent Workload Management (IWM) to automatically manage query workloads. IWM is a set of AI-powered features that process queries quickly and cost-effectively without requiring you to manage infrastructure.

Intelligent Workload Management and autoscaling

IWM uses machine learning models to dynamically manage compute resources:

When a new query arrives, IWM predicts its resource requirements and checks available capacity.
- If capacity exists, the query starts immediately.
- If not, the query is placed in a queue.
IWM continuously monitors the queue. If wait times increase, the autoscaler quickly provisions more clusters to process queued queries.
When demand drops, IWM scales down resources to reduce costs while keeping enough capacity to handle recent peak loads.

This approach provides:

Rapid upscaling to maintain low query latency.
High throughput by admitting queries as soon as hardware is available.
Quick downscaling to save costs during low demand.

Sizing a serverless SQL warehouse

Cluster size (for example, X-Small, Medium, Large) determines the compute resources available for a single cluster. The autoscaler adds or removes clusters of that size as needed.

Use the following guidelines to help choose the right size:

Start with a single larger warehouse and let serverless features manage concurrency and performance. It is usually more efficient to size down if necessary than to start small and scale up.
If queries are spilling to disk, increase the cluster size. Check for spills in the query profile.
For workloads with many concurrent queries, configure a sufficient maximum number of clusters to handle peak loads. Monitor the Peak Queued Queries metric on the warehouse monitoring page.

note

For serverless SQL warehouses, the cluster sizes may, in some cases, use different instance types than the ones listed in the documentation for pro and classic SQL warehouses for equivalent cluster size. In general, the price/performance ratio of the cluster sizes for serverless SQL warehouses is similar to those for pro and classic SQL warehouses.

Monitoring warehouse performance

You can monitor and right-size any SQL warehouse using these tools. The maximum number of queries in a queue for all warehouse types is 1,000.

Monitoring page: On the SQL warehouse monitoring tab, check Peak Queued Queries. A consistent value above 0 indicates that you may need a larger cluster size or more clusters.
Query history: Review historical query performance to identify bottlenecks.
Query profile: Inspect execution plans for metrics such as Bytes spilled to disk, which indicates that the warehouse size may be too small.

Classic and pro SQL warehouses

Classic and pro warehouses use a manual scaling model where you configure the number of clusters.

Sizing and cluster provisioning

Beta

The 5X-Large cluster size is currently in Beta for pro and serverless SQL warehouses. Workspace admins can control access to this feature from the Previews page. See Manage Databricks previews.

When creating a classic or pro warehouse, choose a cluster size and set the minimum and maximum number of clusters. These SKUs have a fixed limit of one cluster per 10 concurrent queries.

Cluster size	Driver instance type	Worker count	Total vCPU	Total Persistent Disk SSD (TB)	Total Local SSD (TB)
2X-Small	n2-highmem-8	1 x n2-highmem-8	16	.2	1.5
X-Small	n2-highmem-8	2 x n2-highmem-8	24	.3	2.25
Small	n2-highmem-16	4 x n2-highmem-8	48	.5	4.5
Medium	n2-highmem-32	8 x n2-highmem-8	96	.9	9
Large	n2-highmem-32	16 x n2-highmem-8	160	1.7	18
X-Large	n2-highmem-64	32 x n2-highmem-8	320	3.3	30
2X-Large	n2-highmem-64	64 x n2-highmem-8	576	6.5	54
3X-Large	n2-highmem-64	128 x n2-highmem-8	1088	12.9	102
4X-Large	n2-highmem-64	256 x n2-highmem-8	2112	25.7	198
5X-Large	n2-highmem-64	512 x n2-highmem-8	4160	51.3	396

The instance size of all workers is n2-highmem-8.

note

The information in this table can vary based on product or region availability and workspace type.

Compute Engine API quota requirements

The relevant Compute Engine API quota fields are:

N2 CPUs
Persistent Disk SSD (GB)
Local SSD (GB)

For more information about quota requirements, see Compute Engine API.

warning

SQL warehouses won't start if you do not provision the required CPU and storage resources. See Compute Engine API. If needed, you can increase the resource quotas to support your use of SQL warehouses. See Review and increase quotas. For information about workspace cost, see cost per workspace.

Queuing and autoscaling logic

For classic and pro warehouses, autoscaling adds clusters based on the estimated time to process all running and queued queries:

2-6 minutes of query load: Add 1 cluster.
6-12 minutes: Add 2 clusters.
12-22 minutes: Add 3 clusters.
Over 22 minutes: Add 3 clusters plus 1 more for every additional 15 minutes of load.

Additional rules:

If a query waits in the queue for 5 minutes, the warehouse scales up.
If load remains low for 15 consecutive minutes, the warehouse scales down to the minimum needed to handle the peak load from that period.

Serverless SQL warehouse management​

Intelligent Workload Management and autoscaling​

Sizing a serverless SQL warehouse​

Monitoring warehouse performance​

Classic and pro SQL warehouses​

Sizing and cluster provisioning​

Compute Engine API quota requirements​

Queuing and autoscaling logic​