Query performance insights
Preview
This feature is in Private Preview. To try it, reach out to your Databricks contact.
When queries run, Databricks might return insights that identify opportunities to improve performance. This page lists the supported insights and their meaning.
For a broader overview of performance best practices, review the Comprehensive Guide to Optimize Databricks, Spark and Delta Lake Workloads.
CONCURRENT_WRITE
- Concurrent writes on the table cause conflicts that are automatically resolved or fail.
- Recommendation: Review the delta history to identify concurrent writes and consider different scheduling to avoid conflicts.
COVERAGE_FILTER_KEYS_CLUSTERING
- The table is clustered by one or more keys that aren't used in filtering during the table scan.
- Recommendation: Determine which data subset you need for the desired outcome, then add filters on matching clustering keys to reduce bytes read.
COVERAGE_FILTER_KEYS_PARTITIONING
- The table is partitioned by one or more keys that aren't used in filtering during the table scan.
- Recommendation: Determine which data subset you need for the desired outcome, then add filters on matching partitioning keys to reduce bytes read.
COVERAGE_PHOTON
- Photon can't accelerate the operation, so the standard runtime engine was used.
- Recommendation: Review Photon limitations and consider adjusting the query to use a supported execution strategy for faster runtime.
COVERAGE_STATS_DELTA
- Delta data skipping statistics are missing or incomplete for the table scan file filters, so the query uses in-file filtering. The following statistics statuses are possible:
- Full: Statistics are available for all filters.
- Partial: Statistics are available on a subset of filters.
- Unavailable: Statistics are not available on any filter.
- Unused: Statistics could not be used on a filter that converts the data type.
- Recommendation: Collect Delta statistics to reduce the number of bytes read.
COVERAGE_STATS_OPTIMIZER
- Cost-based optimizer statistics are missing or incomplete, so standard heuristics were used to generate the query plan.
- Recommendation: Collect statistics to enable the optimizer to produce a better plan.
DATA_SKEW
- Data is processed unevenly by available computing resources.
- Recommendation: Review the distribution of the data, then salt keys or pre-aggregate the data.
EXPLODING_JOIN
- Join is generating significantly more rows than it has read.
- Recommendation: Determine which result subset is required, then update the join or reduce the number of input rows from both relations.
IO_THROTTLING
- Cloud storage request was throttled by your cloud provider.
- Recommendation: Contact your administrator to increase your cloud storage request limits with your cloud provider.