Dataset optimization and caching

AI/BI dashboards are valuable data analysis and decision-making tools, and efficient load times can significantly improve the user experience. This article explains how caching and dataset optimizations make dashboards more performant and efficient.

Query performance

You can inspect queries and their performance in the workspace query history. The query history shows SQL queries performed using SQL warehouses. Click Query History in the sidebar to view the query history. See Query history.

For dashboard datasets, Databricks applies performance optimizations depending on the result size of the dataset. For information about dataset performance thresholds, see Dataset performance thresholds.

Dataset optimizations

Your dashboards optimize for speed by performing filtering and aggregation operations, driven by filters or visualization settings, directly in your browser when possible. These performance optimizations have the following limits:

Dataset Size	Processing Behavior
Small (≤ 100K rows and ≤ 100MB)	For optimal dashboard speed, filtering and aggregation run in your browser after the initial dataset loads. Because these operations are processed locally, they avoid further interaction with the data warehouse and don't appear in the query history.
Large (> 100K rows or > 100MB)	Filtering and aggregation are handled on the backend server instead of in your browser. The initial dataset query is wrapped in a SQL `WITH` clause, and the resulting query appears in the query history.
Combined queries (large datasets)	For visualization queries sent to the backend, separate visualization queries against the same dataset that share the same `GROUP BY` clauses and filter predicates are combined into a single query for processing. In this case, users may see a combined query in the query history that fetches results for multiple visualizations or filters.

note

Parameters substitute values directly into a query at runtime, so these operations always appear in the query history.

Caching and data freshness

Dashboards maintain a 24-hour result cache to optimize initial loading times, operating on a best-effort basis. This means that while the system always attempts to use historical query results linked to dashboard credentials to enhance performance, there are some cases where cached results cannot be created or maintained. Cached data has no specific memory limit or fixed query count.

To improve loading times, dashboards first check the dashboard cache. If no cache results are available, they check the generic query result cache. While the dashboard cache can return stale results for up to 24 hours, the query result cache never returns stale data. When the underlying data changes, all query result cache entries are invalidated.

For multi-page dashboards, the following apply:

Editing a draft dashboard loads and caches all datasets.
When viewers open a published dashboard, only datasets that support the active page are run and cached.
If a schedule is set, all datasets refresh according to the schedule, and those results are cached.

The following table explains how caching varies by dashboard status and credentials:

Dashboard type	Caching type
Dashboard published with shared data permissions	Shared cache. All viewers see the same results.
Draft dashboard or dashboard published with individual data permissions	Per user cache. Viewers see results based on their data permissions.

Dashboards automatically use cached query results if the underlying data remains unchanged after the last query or if the results were retrieved less than 24 hours ago. If stale results exist and parameters are applied to the dashboard, queries will rerun unless the same parameters were used in the past 24 hours. Similarly, applying filters to datasets exceeding 100,000 rows prompts queries to rerun unless the same filters were previously applied in the last 24 hours.

Current timestamp functions and cache invalidation

Using current_timestamp() or similar functions in your SQL query does not invalidate the dashboard-level cache. However, these functions do invalidate the query result cache, which inspects the SQL query, and trigger a cache refresh.

Scheduled queries

Adding a schedule to a dashboard published with shared data permissions can significantly speed up the initial loading process for all dashboard viewers.

For each scheduled dashboard update, the following occurs:

All SQL logic that defines datasets runs on the designated time interval.
Results populate the query result cache and help to improve initial dashboard load time.

Query performance​

Dataset optimizations​

Caching and data freshness​

Current timestamp functions and cache invalidation​

Scheduled queries​

Query performance

Dataset optimizations

Caching and data freshness

Current timestamp functions and cache invalidation

Scheduled queries