Incremental refresh for materialized views
This article outlines the semantics and requirements for incremental refreshes on materialized views, and identifies the SQL operations, keywords, and clauses that support incremental refresh. It includes discussion of the differences between incremental and full refreshes, and includes recommendations for choosing between materialized views and streaming tables.
When running updates on materialized views using serverless pipelines, many queries can be incrementally refreshed. Incremental refreshes save compute costs by detecting changes in the data sources used to define the materialized view and incrementally computing the result.
Refreshes run on serverless compute
Refresh operations are run on serverless pipelines, regardless if the operation was defined in Databricks SQL or with Lakeflow Declarative Pipelines.
For materialized views defined using Databricks SQL, your workspace does not need to be enabled for serverless Lakeflow Declarative Pipelines. The refresh will use a serverless pipeline automatically.
For materialized views defined using Lakeflow Declarative Pipelines, you must configure the pipeline to use serverless. See Configure a serverless pipeline.
What are the refresh semantics for materialized views?
Materialized views guarantee equivalent results to batch queries. For example, consider the following aggregate query:
SELECT account_id,
  COUNT(txn_id) txn_count,
  SUM(txn_amount) account_revenue
FROM transactions_table
GROUP BY account_id
When you run this query using any Databricks product, the result is computed using batch semantics to aggregate all records in the source transactions_table, meaning that all source data is scanned and aggregated in one operation.
Some Databricks products cache results automatically within or across sessions if data sources have not changed after the last query has run. Automatic caching behaviors differ from materialized views.
The following example turns this batch query into a materialized view:
CREATE OR REPLACE MATERIALIZED VIEW transaction_summary AS
SELECT account_id,
  COUNT(txn_id) txn_count,
  SUM(txn_amount) account_revenue
FROM transactions_table
GROUP BY account_id
When you refresh a materialized view, the computed result is identical to the batch query semantics. This query is an example of a materialized view that can be incrementally refreshed, meaning that the refresh operation makes a best-effort attempt to only process new or changed data in the source transactions_table to compute the results.
Data source considerations for materialized views
While you can define a materialized view against any data source, not all data sources are well-suited to materialized views. Consider the following caveats and recommendations:
Materialized views make a best-effort attempt to incrementally refresh results for supported operations. Some changes in data sources require a full refresh.
All data sources for materialized views should be robust to full refresh semantics, even if the query that defines the materialized view supports incremental refresh.
- For queries where a full refresh would be cost-prohibitive, use streaming tables to guarantee exactly-once processing. Examples include very large tables.
- Do not define a materialized view against a data source if records should only be processed once. Instead, use streaming tables. Examples include the following:
- Data sources that don't retain data history, such as Kafka.
- Ingest operations, such as queries that use Auto Loader to ingest data from cloud object storage.
- Any data source where you plan to delete or archive data after processing but need to retain information in downstream tables. For example, a date-partitioned table where you plan to delete records older than a certain threshold.
 
- Not all data sources support incremental refreshes. The following data sources support incremental refresh:
- Delta tables, including Unity Catalog managed tables and external tables backed by Delta Lake.
- Materialized views.
- Streaming tables, including the targets of AUTO CDC ... INTOoperations.
 
- Some incremental refresh operations require row-tracking to be enabled on the queried data sources. Row-tracking is a Delta Lake feature only supported by Delta tables, which include materialized views, streaming tables, and Unity Catalog managed tables. See Use row tracking for Delta tables.
Optimize materialized views
To get the best performance, Databricks recommends enabling the following features on all materialized view source tables:
You can set these features during creation, or later with the ALTER TABLE statement. For example:
ALTER TABLE <table-name> SET TBLPROPERTIES (
  delta.enableDeletionVectors = true,
  delta.enableRowTracking = true,
  delta.enableChangeDataFeed = true);
Refresh types for materialized views
When a materialized view is updated, you can specify a refresh or a full refresh.
- A refresh attempts to do an incremental refresh, but will do a full recompute of the data if needed. Incremental refresh is only available when the compute you are connected to is serverless.
- A full refresh always recalculates all inputs to the materialized view, and resets all checkpoints.
To determine which refresh type an update used, see Determine the refresh type of an update.
Default refresh
The default refresh for a materialized view on serverless attempts to perform an incremental refresh. An incremental refresh processes changes in the underlying data after the last refresh and then appends that data to the table. Depending on the base tables and included operations, only certain types of materialized views can be incrementally refreshed. If an incremental refresh is not possible or the connected compute is classic instead of serverless, a full recompute is performed.
The output of an incremental refresh and a full recompute are the same. Databricks runs a cost analysis to choose the cheaper option between an incremental refresh and a full recompute.
Only materialized views updated using serverless pipelines can use incremental refresh. Materialized views that do not use serverless pipelines are always fully recomputed.
When you create materialized views with a SQL warehouse or serverless Lakeflow Declarative Pipelines, Databricks incrementally refreshes them if their queries are supported. If a query uses unsupported expressions, Databricks runs a full recompute instead, which can increase costs.
To determine which refresh type an update used, see Determine the refresh type of an update.
Full refresh
A full refresh overwrites the results in the materialized view by clearing the table and checkpoints, and reprocessing all data available in the source.
To perform a full refresh on materialized views defined using Databricks SQL, use the following syntax:
REFRESH MATERIALIZED VIEW mv_name FULL
For materialized views defined in Lakeflow Declarative Pipelines, you can choose to run a full refresh on selected datasets or on all datasets in a pipeline. See Pipeline refresh semantics.
When a full refresh runs against a data source where records have been removed due to data retention threshold or manual deletion, removed records are not reflected in computed results. You may be unable to recover old data if the data is no longer available in the source. This might also change the schema for columns that no longer exist in the source data.
Support for materialized view incremental refresh
The following table lists support for incremental refresh by SQL keyword or clause.
Some keywords and clauses require row-tracking to be enabled on the queried data sources. See Use row tracking for Delta tables.
These keywords and clauses are marked with a star (*) in the following table.
| SQL keyword or clause | Support for incremental refresh | 
|---|---|
| 
 | Yes, expressions including deterministic built-in functions and immutable user-defined functions (UDFs) are supported. | 
| 
 | Yes | 
| 
 | Yes, common table expressions are supported. | 
| 
 | Yes | 
| 
 | Supported base tables include Delta tables, materialized views, and streaming tables. | 
| 
 | Filter clauses such as  | 
| 
 | Yes | 
| 
 | Yes | 
| 
 | Yes | 
| 
 | Yes | 
| 
 | Yes.  | 
| 
 | Yes | 
| 
 | Yes, materialized views that include expectations can be incrementally refreshed. However, incremental refresh is not supported for the following cases: 
 | 
| Non-deterministic functions | Non-deterministic time functions are supported in  | 
| Non-Delta sources | Sources such as volumes, external locations, and foreign catalogs are not supported. | 
Determine the refresh type of an update
To optimize the performance of materialized view refreshes, Databricks uses a cost model to select the technique used for the refresh. The following table describes these techniques:
| Technique | Incremental refresh? | Description | 
|---|---|---|
| 
 | No | The materialized view was fully recomputed | 
| 
 | Not applicable | The materialized view was not updated because no changes to the base table were detected. | 
| Any of: 
 | Yes | The materialized view was incrementally refreshed using the specified technique. | 
To determine the technique used, query the Lakeflow Declarative Pipelines event log where the event_type is planning_information:
SELECT
  timestamp,
  message
FROM
  event_log(TABLE(<fully-qualified-table-name>))
WHERE
  event_type = 'planning_information'
ORDER BY
  timestamp desc;
Replace <fully-qualified-table-name> with the fully qualified name of the materialized view, including the catalog and schema.
Sample output for this command:
- 
- timestamp
- message
 
- 
- 2025-03-21T22:23:16.497+00:00
- Flow 'sales' has been planned in :re[LDP] to be executed as ROW_BASED.