Skip to main content

SQL warehouse settings for BI workloads

Business intelligence workloads have distinct characteristics that require specific SQL warehouse configuration considerations. This page provides guidance on analyzing your BI workload requirements and configuring SQL warehouses to deliver optimal performance, cost-efficiency, and reliability.

Workload analysis and SLA requirements

Every BI workload is unique and requires careful analysis before configuration. Consider the following questions when evaluating your requirements:

  • Migration or new implementation: Is this workload being migrated from a different platform, or is it a new implementation? Migrated workloads might have established SLAs and performance baselines.
  • Service level agreements (SLA): What are your latency, throughput, and availability requirements? Document both technical and business SLAs.
  • Access patterns: How do users interact with data? Understanding typical query patterns helps right-size your warehouse configuration and optimize the data layer for the specific workload.

Typical BI access patterns

BI workloads typically fall into two distinct access pattern categories, each requiring different SQL warehouse configurations.

DirectQuery / LiveQuery pattern

DirectQuery patterns query data in real-time, requiring low-latency responses for interactive analytics:

Characteristics:

  • High number of queries
  • Queries usually return small result sets (less than 1,000 records)
  • Usually executed during business hours
  • Strict SLA requirements with low latency expectations
  • Unpredictable query patterns (dashboards, reports)
  • Data accessed per query is typically less than 5GB
  • Requires highly scalable compute to accommodate spiky patterns

Performance expectations:

  • Query response time: seconds (typically less than 5 seconds for interactive dashboards)
  • Data freshness: Up-to-date, reflecting most recent data

Workload profile:

  • Frequent spikes during business hours
  • Unpredictable load variations (user-driven)
  • Can extend to 24x7 for global organizations

Import / Extract pattern

Import patterns extract data for downstream systems, prioritizing throughput over latency:

Characteristics:

  • Low number of queries (scheduled refreshes)
  • Usually large result sets (more than 1,000,000 records)
  • Usually scheduled during off-peak hours
  • Predictable query patterns (often drill-down driven)
  • Data accessed per query: up to tens of GB

Performance expectations:

  • Query response time: minutes to hours (batch-oriented)
  • Data freshness: Day snapshot or the previous day

Workload profile:

  • Scheduled, predictable execution windows
  • Known workload characteristics and resource requirements
  • Batch-oriented processing

Query mix in DirectQuery workloads

When using DirectQuery patterns with a star schema data model, expect the following query distribution:

  • Dimension queries: Many small queries scanning dimension tables (customer, product, time)
  • Fact queries: Many large queries scanning fact tables with joins and aggregations
  • Extract queries: Some simple but long-running queries for large data extracts

This varied query mix requires SQL warehouses that can efficiently handle both small, frequent queries and large analytical queries simultaneously.

Multi-warehouse strategy for workload isolation

Databricks recommends provisioning multiple SQL warehouses to achieve:

Right-sizing and optimal costs

  • Size each warehouse appropriately for its specific workload pattern
  • Avoid over-provisioning by separating workloads with different resource requirements
  • Use smaller warehouses for development and testing, larger for production
  • Use warehouse scalability to find the ideal balance between performance and cost

Better overall performance

  • Prevent resource contention between DirectQuery and Import/Extract patterns
  • Isolate interactive dashboards from batch refresh operations
  • Enable independent scaling based on workload demands

Cross-charging and cost allocation

  • Track usage and costs by business unit, project, or team
  • Enable accurate chargeback models
  • Improve cost visibility and accountability

More efficient administration and management

  • Assign ownership and management responsibilities by team or project
  • Apply different auto-stop policies based on usage patterns
  • Configure separate access controls and monitoring

For DirectQuery / LiveQuery workloads

  • Use serverless SQL warehouses for automatic resource management
  • Configure aggressive auto-stop (15-30 minutes) for cost optimization
  • Set cluster size based on query complexity and data volume (start with Medium, scale up if needed)
  • Set min and max number of clusters based on the anticipated workload
  • Monitor the Peak Queued Queries metric and adjust max clusters accordingly

For Import / Extract workloads

  • Use Pro or Classic SQL warehouses for predictable, scheduled jobs
  • Configure longer auto-stop times (1-2 hours) if multiple jobs run in sequence
  • Use larger cluster sizes (Large, X-Large) for complex aggregations
  • Consider fixed scheduling to align with batch windows
  • Monitor query duration and adjust sizing based on SLA requirements

For more information about SQL warehouse sizing and scaling behavior, see SQL warehouse sizing, scaling, and queuing behavior.

For a quick reference of BI serving best practices, see BI serving cheat sheet.