Use materialized views in Databricks SQL

This article describes how to create and refresh materialized views in Databricks SQL to improve performance and reduce the cost of your data processing and analysis workloads.

What are materialized views?

In Databricks SQL, materialized views are Unity Catalog managed tables that physically store the results of a query. Unlike standard views, which compute results on demand, materialized views cache the results and update them as the underlying source tables change—either on a schedule or automatically.

Materialized views are well-suited for data processing workloads such as extract, transform, and load (ETL) processing. Materialized views provide a simple, declarative way to process data for compliance, corrections, aggregations, or general change data capture (CDC). Materialized views also enable easy-to-use transformations by cleaning, enriching, and denormalizing base tables. By pre-computing expensive or frequently used queries, Materialized views lower query latency and resource consumption. In many cases, they can incrementally compute changes from source tables, further improving efficiency and end-user experience.

The following are common use cases for materialized views:

Keeping a BI dashboard up to date with minimal end-user query latency.
Reducing complex ETL orchestration with simple SQL logic.
Building complex, layered transformations.
Any use cases that demand consistent performance with up-to-date insights.

When you create a materialized view in a Databricks SQL warehouse, a serverless pipeline is created to process the create and refreshes to the materialized view. You can monitor the status of refresh operations in Catalog Explorer. See View materialized view details with DESCRIBE EXTENDED.

Requirements

Materialized views created in Databricks SQL are backed by a serverless pipeline. Your workspace must support serverless pipelines to use this functionality.

Requirements to create or refresh materialized views:

You must use a Unity Catalog-enabled pro or serverless SQL warehouse.
To refresh a materialized view, you must be in the workspace that created it.
To incrementally refresh a materialized view from Delta tables, the source tables must have row tracking enabled.
The owner (the user who creates the materialized view) must have the following permissions:
- SELECT privilege on the base tables referenced by the materialized view.
- USE CATALOG and USE SCHEMA privileges on the catalog and schema containing the source tables for the materialized view.
- USE CATALOG and USE SCHEMA privileges on the target catalog and schema for the materialized view.
- CREATE TABLE and CREATE MATERIALIZED VIEW privileges on the schema containing the materialized view.
To refresh a materialized view, you must have the REFRESH privilege on the materialized view.

Your workspace must be in a region that supports serverless SQL warehouses.
You must have accepted the serverless terms of use.

Requirements to query materialized views:

You must be the owner of the materialized view, or have SELECT on the materialized view, along with USE SCHEMA and USE CATALOG on its parents.
You must use one of the following compute resources:
- SQL warehouse
- Lakeflow Declarative Pipelines interfaces
- Standard access mode compute (formerly shared access mode)
- Dedicated access mode (formerly single user access mode) on Databricks Runtime 15.4 and above, as long as the workspace is enabled for serverless compute. See Fine-grained access control on dedicated compute.
  
  If you are the materialized view owner, you can use a Dedicated access mode compute resource that is running Databricks Runtime between 14.3 and above.

To learn about other restrictions on using materialized views, see Limitations.

Create a materialized view

Databricks SQL materialized view CREATE operations use a Databricks SQL warehouse to create and load data in the materialized view. Creating a materialized view is a synchronous operation, which means that the CREATE MATERIALIZED VIEW command blocks until the materialized view is created and the initial data load finishes. A serverless pipeline is automatically created for every Databricks SQL materialized view. When the materialized view is refreshed Lakeflow Declarative Pipelines processes the refresh.

To create a materialized view, use the CREATE MATERIALIZED VIEW statement. To submit a create statement, use the SQL editor in the Databricks UI, the Databricks SQL CLI, or the Databricks SQL API.

The user who creates a materialized view is the materialized view owner.

The following example creates the materialized view mv1 from the base table base_table1:

SQL
-- This query defines the materialized view:
CREATE OR REPLACE MATERIALIZED VIEW mv1
AS SELECT
  date,
  sum(sales) AS sum_of_sales
FROM
  base_table1
GROUP BY
  date;

When you create a materialized view using the CREATE OR REPLACE MATERIALIZED VIEW statement, the initial data refresh and population begins immediately. This does not consume SQL warehouse compute. Instead, serverless Lakeflow Declarative Pipelines is used for creation and subsequent refreshes.

Column comments on a base table are automatically propagated to the new materialized view on create only. To add a schedule, table constraints, or other properties, modify the materialized view definition (the SQL query).

The same SQL statement will refresh a materialized view if called a subsequent time, or on a schedule. A refresh done in this way acts as any other refresh. For details, see Refresh a materialized view.

To learn more about configuring a materialized view, see Configure materialized views in Databricks SQL. To learn about the complete syntax for creating a materialized view, see CREATE MATERIALIZED VIEW. To learn about loading data in different formats and from different places, see Load data with Lakeflow Declarative Pipelines.

Load data from external systems

Databricks recommends loading external data using Lakehouse Federation for supported data sources. For information on loading data from sources not supported by Lakehouse Federation, see Data format options. For general information about loading data, including examples, see Load data with Lakeflow Declarative Pipelines.

Hide sensitive data

Preview

This feature is in Public Preview.

You can use materialized views to hide sensitive data from users accessing the table. One way to do this is to create the query so it doesn't include that data in the first place. But you can also mask columns or filter rows based on the permissions of the querying user. For example, you could hide the tax_id column for users that are not in the group HumanResourcesDept. To do this, use the ROW FILTER and MASK syntax during the creation of the materialized view. For more information, see Row filters and column masks.

Refresh a materialized view

Refreshing a materialized view updates the view to reflect the latest changes to the base table at the time of the refresh.

When you define a materialized view, the CREATE OR REPLACE MATERIALIZED VIEW statement is used both to create the view, and to refresh it for any scheduled refreshes. You can also use the REFRESH MATERIALIZED VIEW statement to refresh the materialized view without needing to supply the query again. See REFRESH (MATERIALIZED VIEW or STREAMING TABLE) for details on the SQL syntax and parameters for this command. To learn more about the types of materialized views that can be incrementally refreshed, see Incremental refresh for materialized views.

To submit a refresh statement, use the SQL editor in the Databricks UI, a notebook attached to a SQL warehouse, the Databricks SQL CLI, or the Databricks SQL API.

The owner, and any user who has been granted the REFRESH privilege on the table, can refresh the materialized view.

The following example refreshes the mv1 materialized view:

SQL
REFRESH MATERIALIZED VIEW mv1;

The operation is synchronous by default, meaning that the command blocks until the refresh operation is complete. To refresh asynchronously, you can add the ASYNC keyword:

SQL
REFRESH MATERIALIZED VIEW mv1 ASYNC;

How are Databricks SQL materialized views refreshed?

Materialized views automatically create and use serverless Lakeflow Declarative Pipelines to process refresh operations. The refresh is managed by the pipeline and the update is monitored by the Databricks SQL warehouse used to create the materialized view. Materialized views can be updated using a pipeline that runs on a schedule. Databricks SQL created materialized views always run in triggered mode. See Triggered vs. continuous pipeline mode.

Materialized views are refreshed using one of two methods.

Incremental refresh - The system evaluates the view's query to identify changes that happened after the last update and merges only the new or modified data.
Full refresh - If an incremental refresh can't be performed, the system runs the entire query and replaces the existing data in the materialized view with the new results.

The structure of the query and the type of source data determine whether incremental refresh is supported. To support incremental refresh, source data should be stored in Delta tables, with row tracking and change data feed enabled. After you create a materialized view, you can monitor its refresh behavior to verify whether it is updated incrementally or through a full refresh.

For details on refresh types, and how to optimize for incremental refreshes, see Incremental refresh for materialized views.

Asynchronous refreshes

By default, refresh operations are performed synchronously. You can also set a refresh operation to occur asynchronously. This can be set using the refresh command with the ASYNC keyword. See REFRESH (MATERIALIZED VIEW or STREAMING TABLE) The behavior associated with each approach is as follows:

Synchronous: A synchronous refresh prevents other operations from proceeding until the refresh is complete. If the result is needed for the next step, such as when sequencing refresh operations in orchestration tools like Lakeflow Jobs, use a synchronous refresh. To orchestrate materialized views with a job, use the SQL task type. See Lakeflow Jobs.
Asynchronous: An asynchronous refresh starts a background job on Lakeflow Declarative Pipelines compute when a materialized view refresh begins, allowing the command to return before the data load completes. This refresh type can save on cost because the operation doesn't necessarily hold compute capacity in the warehouse where the command is initiated. If the refresh becomes idle and no other tasks are running, the warehouse can shut down while the refresh uses other available compute. Additionally, asynchronous refreshes support starting multiple operations in parallel.

Schedule materialized view refreshes

You can configure a Databricks SQL materialized view to refresh automatically based on a defined schedule. To set a schedule, do one of the following:

Configure the schedule with the SCHEDULE clause when you create the materialized view
Add a schedule with the ALTER MATERIALIZED VIEW statement.

note

Alternately, you may create a task in a job that includes either the CREATE OR REPLACE MATERIALIZED VIEW or a REFRESH statement, and orchestrate it as you would any other job. See Lakeflow Jobs.

The following example creates the materialized view mv1 from the base table base_table1, and a schedule to refresh the materialized view once an hour:

SQL
CREATE OR REPLACE MATERIALIZED VIEW mv1
  SCHEDULE EVERY 1 hour
  AS SELECT
    date,
    sum(sales) AS sum_of_sales
  FROM
    base_table1
  GROUP BY
    date;

To set or change the schedule after creation, use the ALTER MATERIALIZED VIEW statement:

SQL
ALTER MATERIALIZED VIEW sales ALTER SCHEDULE EVERY 1 day;

When a schedule is created, a new Databricks job is automatically configured to process the update.

To view the schedule, do one of the following:

Run the DESCRIBE EXTENDED statement from the SQL editor in the Databricks UI. See DESCRIBE TABLE.
Use Catalog Explorer to view the materialized view. The schedule is listed on the Overview tab, under Refresh status. See What is Catalog Explorer?.

When there is a schedule for refreshes, you still have the option to run a manual refresh at any time, if you have a need for updated data.

Stop an active refresh

To stop an active refresh in the Lakeflow Declarative Pipelines UI, in the Pipeline details page click Stop to stop the pipeline update. You can also stop the refresh with the Databricks CLI or the POST /api/2.0/pipelines/{pipeline_id}/stop operation in the Pipelines API.

Permanently delete records from a materialized view with deletion vectors enabled

Preview

Support for the REORG statement with materialized views is in Public Preview.

note

Using a REORG statement with a materialized view requires Databricks Runtime 15.4 and above.
Although you can use the REORG statement with any materialized view, it's only required when deleting records from a materialized view with deletion vectors enabled. The command has no effect when used with a materialized view without deletion vectors enabled.

To physically delete records from the underlying storage for a materialized view with deletion vectors enabled, such as for GDPR compliance, additional steps must be taken to ensure that a VACUUM operation runs on the materialized view's data.

To physically delete records:

Run a REORG statement against the materialized view, specifying the APPLY (PURGE) parameter. For example REORG TABLE <materialized-view-name> APPLY (PURGE);. See REORG TABLE.
Wait for the materialized view's data retention period to pass. The default data retention period is seven days, but it can be configured with the delta.deletedFileRetentionDuration table property. See Configure data retention for time travel queries.
REFRESH the materialized view. See Refresh a materialized view. Within 24 hours of the REFRESH operation, Lakeflow Declarative Pipelines maintenance tasks, including the VACUUM operation which is required to ensure records are permanently deleted, are run automatically.

Drop a materialized view

note

To submit the command to drop a materialized view, you must be the owner of that materialized view or have the MANAGE privilege on the materialized view.

To drop a materialized view, use the DROP VIEW statement. To submit a DROP statement, you can use the SQL editor in the Databricks UI, the Databricks SQL CLI, or the Databricks SQL API. The following example drops the mv1 materialized view:

SQL
DROP MATERIALIZED VIEW mv1;

You can also use the Catalog Explorer to drop a materialized view.

Click Catalog in the sidebar.
In the Catalog Explorer tree at the left, open the catalog and select the schema where your materialized view is located.
Open the Tables item under the schema you selected, and click the materialized view.
On the kebab menu , select Delete.

Understand the costs of a materialized view

Because an materialized view runs in serverless compute, outside of the compute that you set up for a notebook or job, you may wonder how to understand the costs associated with it. Materialized view usage is tracked by DBU consumption. To learn more, see What is the DBU consumption of a materialized view or streaming table?

Enabling row tracking

In order to support incremental refreshes from Delta tables, row tracking must be enabled for those source tables. If you recreate a source table, you must re-enable row tracking.

The following example shows how to enable row tracking on a table:

SQL
ALTER TABLE source_table SET TBLPROPERTIES (delta.enableRowTracking = true);

For more details, see Use row tracking for Delta tables

Limitations

For compute and workspace requirements, see Requirements.
For incremental refresh requirements, see Incremental refresh for materialized views.
Materialized views do not support identity columns or surrogate keys.
If a materialized view uses a sum aggregate over a NULL-able column and only NULL values remain in that column, the materialized views resultant aggregate value is zero instead of NULL.
You cannot read a change data feed from a materialized view.
Time travel queries are not supported on materialized views.
The underlying files supporting materialized views might include data from upstream tables (including possible personally identifiable information) that do not appear in the materialized view definition. This data is automatically added to the underlying storage to support incremental refreshing of materialized views. Because the underlying files of a materialized view might risk exposing data from upstream tables not part of the materialized view schema, Databricks recommends not sharing the underlying storage with untrusted downstream consumers. For example, suppose the definition of a materialized view includes a COUNT(DISTINCT field_a) clause. Even though the materialized view definition only includes the aggregate COUNT DISTINCT clause, the underlying files will contain a list of the actual values of field_a.
You may incur some serverless compute charges, even when using these features on dedicated compute.

If you need to use an AWS PrivateLink connection with your materialized view, contact your Databricks representative.

What are materialized views?​

Requirements​

Requirements to create or refresh materialized views:​

Requirements to query materialized views:​

Create a materialized view​

Load data from external systems​

Hide sensitive data​

Refresh a materialized view​

How are Databricks SQL materialized views refreshed?​

Asynchronous refreshes​

Schedule materialized view refreshes​

Stop an active refresh​

Permanently delete records from a materialized view with deletion vectors enabled​

Drop a materialized view​

Understand the costs of a materialized view​

Enabling row tracking​

Limitations​

Related articles​

What are materialized views?

Requirements

Requirements to create or refresh materialized views:

Requirements to query materialized views:

Create a materialized view

Load data from external systems

Hide sensitive data

Refresh a materialized view

How are Databricks SQL materialized views refreshed?

Asynchronous refreshes

Schedule materialized view refreshes

Stop an active refresh

Permanently delete records from a materialized view with deletion vectors enabled

Drop a materialized view

Understand the costs of a materialized view

Enabling row tracking

Limitations

Related articles