Dataset optimization and caching
AI/BI dashboards are valuable data analysis and decision-making tools, and efficient load times can significantly improve the user experience. This article explains how caching and dataset optimizations make dashboards more performant and efficient.
Query performance
You can inspect queries and their performance in the workspace query history. The query history shows SQL queries performed using SQL warehouses. Click Query History in the sidebar to view the query history. See Query history.
For dashboard datasets, Databricks applies performance optimizations depending on the result size of the dataset. For information about dataset performance thresholds, see Dataset performance thresholds.
Dataset optimizations
Your dashboards optimize for speed by performing filtering and aggregation operations, driven by filters or visualization settings, directly in your browser when possible. These performance optimizations have the following limits:
Dataset Size | Processing Behavior |
---|---|
Small (≤ 100K rows and ≤ 100MB) | For optimal dashboard speed, filtering and aggregation run in your browser after the initial dataset loads. Because these operations are processed locally, they avoid further interaction with the data warehouse and don't appear in the query history. |
Large (> 100K rows or > 100MB) | Filtering and aggregation are handled on the backend server instead of in your browser. The initial dataset query is wrapped in a SQL |
Combined queries (large datasets) | For visualization queries sent to the backend, separate visualization queries against the same dataset that share the same |
Parameters substitute values directly into a query at runtime, so these operations always appear in the query history.
Caching and data freshness
Dashboards maintain a 24-hour result cache to optimize initial loading times, operating on a best-effort basis. This means that while the system always attempts to use historical query results linked to dashboard credentials to enhance performance, there are some cases where cached results cannot be created or maintained. Cached data has no specific memory limit or fixed query count.
To improve loading times, dashboards first check the dashboard cache. If no cache results are available, they check the generic query result cache. While the dashboard cache can return stale results for up to 24 hours, the query result cache never returns stale data. When the underlying data changes, all query result cache entries are invalidated.
For multi-page dashboards, the following apply:
- Editing a draft dashboard loads and caches all datasets.
- When viewers open a published dashboard, only datasets that support the active page are run and cached.
- If a schedule is set, all datasets refresh according to the schedule, and those results are cached.
The following table explains how caching varies by dashboard status and credentials:
Dashboard type | Caching type |
---|---|
Published dashboard with embedded credentials | Shared cache. All viewers see the same results. |
Draft dashboard or published dashboard without embedded credentials | Per user cache. Viewers see results based on their data permissions. |
Dashboards automatically use cached query results if the underlying data remains unchanged after the last query or if the results were retrieved less than 24 hours ago. If stale results exist and parameters are applied to the dashboard, queries will rerun unless the same parameters were used in the past 24 hours. Similarly, applying filters to datasets exceeding 100,000 rows prompts queries to rerun unless the same filters were previously applied in the last 24 hours.
Scheduled queries
Adding a schedule to a published dashboard with embedded credentials can significantly speed up the initial loading process for all dashboard viewers.
For each scheduled dashboard update, the following occurs:
- All SQL logic that defines datasets runs on the designated time interval.
- Results populate the query result cache and help to improve initial dashboard load time.