SQL warehouse settings for BI workloads

Business intelligence workloads have distinct characteristics that require specific SQL warehouse configuration considerations. This page provides guidance on analyzing your BI workload requirements and configuring SQL warehouses to deliver optimal performance, cost-efficiency, and reliability.

Workload analysis and SLA requirements

Every BI workload is unique and requires careful analysis before configuration. Consider the following questions when evaluating your requirements:

Migration or new implementation: Is this workload being migrated from a different platform, or is it a new implementation? Migrated workloads might have established SLAs and performance baselines.
Service level agreements (SLA): What are your latency, throughput, and availability requirements? Document both technical and business SLAs.
Access patterns: How do users interact with data? Understanding typical query patterns helps right-size your warehouse configuration and optimize the data layer for the specific workload.

Typical BI access patterns

BI workloads typically fall into two distinct access pattern categories, each requiring different SQL warehouse configurations.

DirectQuery / LiveQuery pattern

DirectQuery patterns query data in real-time, requiring low-latency responses for interactive analytics:

Characteristics:

High number of queries
Queries usually return small result sets (less than 1,000 records)
Usually executed during business hours
Strict SLA requirements with low latency expectations
Unpredictable query patterns (dashboards, reports)
Data accessed per query is typically less than 5GB
Requires highly scalable compute to accommodate spiky patterns

Performance expectations:

Query response time: seconds (typically less than 5 seconds for interactive dashboards)
Data freshness: Up-to-date, reflecting most recent data

Workload profile:

Frequent spikes during business hours
Unpredictable load variations (user-driven)
Can extend to 24x7 for global organizations

Import / Extract pattern

Import patterns extract data for downstream systems, prioritizing throughput over latency:

Characteristics:

Low number of queries (scheduled refreshes)
Usually large result sets (more than 1,000,000 records)
Usually scheduled during off-peak hours
Predictable query patterns (often drill-down driven)
Data accessed per query: up to tens of GB

Performance expectations:

Query response time: minutes to hours (batch-oriented)
Data freshness: Day snapshot or the previous day

Workload profile:

Scheduled, predictable execution windows
Known workload characteristics and resource requirements
Batch-oriented processing

Query mix in DirectQuery workloads

When using DirectQuery patterns with a star schema data model, expect the following query distribution:

Dimension queries: Many small queries scanning dimension tables (customer, product, time)
Fact queries: Many large queries scanning fact tables with joins and aggregations
Extract queries: Some simple but long-running queries for large data extracts

This varied query mix requires SQL warehouses that can efficiently handle both small, frequent queries and large analytical queries simultaneously.

Multi-warehouse strategy for workload isolation

Databricks recommends provisioning multiple SQL warehouses to achieve:

Right-sizing and optimal costs

Size each warehouse appropriately for its specific workload pattern
Avoid over-provisioning by separating workloads with different resource requirements
Use smaller warehouses for development and testing, larger for production
Use warehouse scalability to find the ideal balance between performance and cost

Better overall performance

Prevent resource contention between DirectQuery and Import/Extract patterns
Isolate interactive dashboards from batch refresh operations
Enable independent scaling based on workload demands

Cross-charging and cost allocation

Track usage and costs by business unit, project, or team
Enable accurate chargeback models
Improve cost visibility and accountability

More efficient administration and management

Assign ownership and management responsibilities by team or project
Apply different auto-stop policies based on usage patterns
Configure separate access controls and monitoring

Recommended warehouse configurations

For DirectQuery / LiveQuery workloads

Use serverless SQL warehouses for automatic resource management
Configure aggressive auto-stop (15-30 minutes) for cost optimization
Set cluster size based on query complexity and data volume (start with Medium, scale up if needed)
Set min and max number of clusters based on the anticipated workload
Monitor the Peak Queued Queries metric and adjust max clusters accordingly

For Import / Extract workloads

Use Pro or Classic SQL warehouses for predictable, scheduled jobs
Configure longer auto-stop times (1-2 hours) if multiple jobs run in sequence
Use larger cluster sizes (Large, X-Large) for complex aggregations
Consider fixed scheduling to align with batch windows
Monitor query duration and adjust sizing based on SLA requirements

For more information about SQL warehouse sizing and scaling behavior, see SQL warehouse sizing, scaling, and queuing behavior.

For a quick reference of BI serving best practices, see BI serving cheat sheet.

Workload analysis and SLA requirements​

Typical BI access patterns​

DirectQuery / LiveQuery pattern​

Import / Extract pattern​

Query mix in DirectQuery workloads​

Multi-warehouse strategy for workload isolation​

Right-sizing and optimal costs​

Better overall performance​

Cross-charging and cost allocation​

More efficient administration and management​

Recommended warehouse configurations​

For DirectQuery / LiveQuery workloads​

For Import / Extract workloads​