CREATE MATERIALIZED VIEW

Applies to: Databricks SQL

A materialized view is a view where precomputed results are available for query and can be updated to reflect changes in the input. Each time a materialized view is refreshed, query results are recalculated to reflect changes in upstream datasets. All materialized views are backed by an ETL pipeline. You can refresh materialized views manually or on a schedule.

To learn more about how to perform a manual refresh, see REFRESH (MATERIALIZED VIEW or STREAMING TABLE).

To learn more about how to schedule a refresh, see Examples or ALTER MATERIALIZED VIEW.

Materialized views can only be created using a Pro or Serverless SQL warehouse, or within a pipeline.

注記

Create and refresh operations on materialized views and streaming tables are powered by serverless Lakeflow Spark Declarative Pipelines. You can use Catalog Explorer to view details about the backing pipelines in the UI. See What is Catalog Explorer?.

Syntax

{ CREATE OR REPLACE MATERIALIZED VIEW | CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] }
  view_name
  [ column_list ]
  [ view_clauses ]
  AS query

column_list
   ( { column_name column_type column_properties } [, ...]
      [ , table_constraint ] [...])

   column_properties
      { NOT NULL | COMMENT column_comment | column_constraint | MASK clause } [ ... ]

view_clauses
  { PARTITIONED BY (col [, ...]) |
    CLUSTER BY clause |
    COMMENT view_comment |
    DEFAULT COLLATION UTF8_BINARY |
    TBLPROPERTIES clause |
    SCHEDULE [ REFRESH ] schedule_clause |
    schedule |
    WITH { ROW FILTER clause } } [...]

schedule
  { SCHEDULE [ REFRESH ] schedule_clause |
    TRIGGER ON UPDATE [ AT MOST EVERY trigger_interval ] }

schedule_clause
  { EVERY number { HOUR | HOURS | DAY | DAYS | WEEK | WEEKS } |
    CRON cron_string [ AT TIME ZONE timezone_id ] }

Parameters

REPLACE

If specified, replaces the view and its content if it already exists.
IF NOT EXISTS

Creates the view if it does not exist. If a view by this name already exists, the CREATE MATERIALIZED VIEW statement is ignored.

You may specify at most one of IF NOT EXISTS or OR REPLACE.
view_name

The name of the newly created view. The fully qualified view name must be unique.
column_list

Optionally labels the columns in the query result of the view. If you provide a column list the number of column aliases must match the number of expressions in the query. If no column list is specified, aliases are derived from the body of the view.
- column_name
  
  The column names must be unique and map to the output columns of the query.
- column_type
  
  Specifies the column's data type. Not all data types supported by Databricks are supported by materialized views.
- column_comment
  
  An optional STRING literal describing the column. This option must be specified along with column_type. If the column type is not specified, the column comment is skipped.
- column_constraint
  
  Adds an informational primary key or informational foreign key constraint to the column in a materialized view. If the column type is not specified, the column constraint is skipped.
- MASK clause
  
  Adds a column mask function to anonymize sensitive data. All subsequent queries from that column receive the result of evaluating that function over the column in place of the column's original value. This can be useful for fine-grained access control purposes where the function can inspect the identity or group memberships of the invoking user to determine whether to redact the value. If the column type is not specified, the column mask is skipped.
table_constraint

Adds an informational primary key or informational foreign key constraint to the table in a materialized view. If the column type is not specified, the table constraint is skipped.
view_clauses

Optionally specify partitioning, comments, user defined properties, and a refresh schedule for the new materialized view. Each sub clause may only be specified once.
- PARTITIONED BY
  
  An optional list of columns of the table to partition the table by.
  
  注記
  Liquid clustering provides a flexible, optimized solution for clustering. Consider using CLUSTER BY instead of PARTITIONED BY for materialized views.
- CLUSTER BY
  
  An optional clause to cluster by a subset of columns. Use automatic liquid clustering with CLUSTER BY AUTO, and Databricks intelligently chooses clustering keys to optimize query performance. See Use liquid clustering for tables.
  
  Liquid clustering cannot be combined with PARTITIONED BY.
- COMMENT view_comment
  
  A STRING literal to describe the table.
- DEFAULT COLLATION UTF8_BINARY
  
  Applies to: Databricks SQL Databricks Runtime 17.1 and above
  
  Forces the default collation of the materialized view to UTF8_BINARY. This clause is mandatory if the schema in which the view is created has a default collation other than UTF8_BINARY. The default collation of the materialized view is used as the default collation within the view body.
- TBLPROPERTIES
  
  Optionally sets one or more user defined properties.
  
  Use this setting to specify the Lakeflow Spark Declarative Pipelines runtime channel used to run this statement. Set the value of the pipelines.channel property to "PREVIEW" or "CURRENT". The default value is "CURRENT". For more information about Lakeflow Spark Declarative Pipelines channels, see Lakeflow Spark Declarative Pipelines runtime channels.
- schedule
  
  The schedule can either be a SCHEDULE statement or a TRIGGER statement.
  - SCHEDULE [ REFRESH ] schedule_clause
    - EVERY number { HOUR | HOURS | DAY | DAYS | WEEK | WEEKS }
      
      To schedule a refresh that occurs periodically, use EVERY syntax. If EVERY syntax is specified, the streaming table or materialized view is refreshed periodically at the specified interval based on the provided value, such as HOUR, HOURS, DAY, DAYS, WEEK, or WEEKS. The following table lists accepted integer values for number.
      
      Time unit
      Integer value
      HOUR or HOURS
      1 <= H <= 72
      DAY or DAYS
      1 <= D <= 31
      WEEK or WEEKS
      1 <= W <= 8
      
      注記
      The singular and plural forms of the included time unit are semantically equivalent.
    - CRON cron_string [ AT TIME ZONE timezone_id ]
      
      To schedule a refresh using a quartz cron value. Valid time_zone_values are accepted. AT TIME ZONE LOCAL is not supported.
      
      If AT TIME ZONE is absent, the session time zone is used. If AT TIME ZONE is absent and the session time zone is not set, an error is thrown. SCHEDULE is semantically equivalent to SCHEDULE REFRESH.
  - TRIGGER ON UPDATE [ AT MOST EVERY trigger_interval ]
    
    Beta
    The TRIGGER ON UPDATE feature is in Beta.
    
    Optionally set the table to refresh when an upstream data source is updated, at most once every minute. Set a value for AT MOST EVERY to require at least a minimum time between refreshes.
    
    The upstream data sources must be either external or managed Delta tables (including materialized views or streaming tables), or managed views whose dependencies are limited to supported table types. Delta Sharing shared tables are not supported as upstream sources.
    
    Enabling file events can make triggers more performant, and increases some of the limits on trigger updates.
    
    The trigger_interval is an INTERVAL statement that is at least 1 minute.
    
    TRIGGER ON UPDATE has the following limitations
    - No more than 10 upstream data sources per materialized view when using TRIGGER ON UPDATE.
    - Maximum of 1000 streaming tables or materialized views can be specified with TRIGGER ON UPDATE.
    - The AT MOST EVERY clause defaults to 1 minute, and cannot be less than 1 minute.
- WITH ROW FILTER clause
  
  Adds a row filter function to the table. All subsequent queries from that table receive a subset of the rows for which the function evaluates to boolean TRUE. This can be useful for fine-grained access control purposes where the function can inspect the identity or group memberships of the invoking user to determine whether to filter certain rows.
AS query

A query that constructs the view from base tables or other views.

Time unit	Integer value
`HOUR or HOURS`	1 <= H <= 72
`DAY or DAYS`	1 <= D <= 31
`WEEK or WEEKS`	1 <= W <= 8

Required permissions

The user who creates a materialized view (MV) is the MV owner and needs to have the following permissions:

SELECT privilege over the base tables referenced by the MV.
USE CATALOG privilege on the parent catalog and the USE SCHEMA privilege on the parent schema.
CREATE MATERIALIZED VIEW privilege on the schema for the MV.

For a user to be able to refresh the MV, they require:

USE CATALOG privilege on the parent catalog and the USE SCHEMA privilege on the parent schema.
Ownership of the MV or REFRESH privilege on the MV.
The owner of the MV must have the SELECT privilege over the base tables referenced by the MV.

For a user to be able to query the MV, they require:

USE CATALOG privilege on the parent catalog and the USE SCHEMA privilege on the parent schema.
SELECT privilege over the materialized view.

Row filters and column masks

Row filters let you specify a function that applies as a filter whenever a table scan fetches rows. These filters ensure that subsequent queries only return rows for which the filter predicate evaluates to true.

Column masks let you mask a column's values whenever a table scan fetches rows. All future queries involving that column will receive the result of evaluating the function over the column, replacing the column's original value.

For more information on how to use row filters and column masks, see Row filters and column masks.

Managing Row Filters and Column Masks

Row filters and column masks on materialized views should be added through the CREATE statement.

Behavior

Refresh as Definer: When the REFRESH MATERIALIZED VIEW statement refreshes a materialized view, row filter functions run with the definer's rights (as the table owner). This means the table refresh uses the security context of the user who created the materialized view.
Query: While most filters run with the definer's rights, functions that check user context (such as CURRENT_USER and IS_MEMBER) are exceptions. These functions run as the invoker. This approach enforces user-specific data security and access controls based on the current user's context.
When creating materialized views over source tables that contain row filters and column masks, the refresh of the materialized view is always a full refresh. A full refresh reprocesses all data available in the source with the latest definitions. This ensures that security policies on the source tables are evaluated and applied with the most up-to-date data and definitions.

Observability

Use DESCRIBE EXTENDED, INFORMATION_SCHEMA, or the Catalog Explorer to examine the existing row filters and column masks that apply to a given materialized view. This functionality allows users to audit and review data access and protection measures on materialized views.

Limitations

When a materialized view with a sum aggregate over a NULL-able column has the last non-NULL value removed from that column - and thus only NULL values remain in that column - the materialized view's resultant aggregate value returns zero instead of NULL.
Column-reference does not require an alias. Non-column reference expressions require an alias, as in the following example:
- Allowed: SELECT col1, SUM(col2) AS sum_col2 FROM t GROUP BY col1
- Not Allowed: SELECT col1, SUM(col2) FROM t GROUP BY col1
NOT NULL must be manually specified along with PRIMARY KEY in order to be a valid statement.
Materialized views do not support identity columns or surrogate keys.
Materialized views do not support OPTIMIZE and VACUUM commands. Maintenance happens automatically.
Materialized views do not support expectations to define data quality constraints.

Examples

SQL
-- Create a materialized view if it doesn't exist
> CREATE MATERIALIZED VIEW IF NOT EXISTS subscribed_movies
  AS SELECT mo.member_id, mb.full_name, mo.movie_title
       FROM movies AS mo INNER JOIN members AS mb ON mo.member_id = mb.id;

-- Create and schedule a materialized view to be refreshed whenever the
-- upstream data is updated
> CREATE MATERIALIZED VIEW IF NOT EXISTS subscribed_movies
  TRIGGER ON UPDATE
  AS SELECT mo.member_id, mb.full_name, mo.movie_title
       FROM movies AS mo INNER JOIN members AS mb ON mo.member_id = mb.id;

-- Create and schedule a materialized view to be refreshed every day.
-- Note: All columns in a GROUP BY need to be explicitly aliased
> CREATE MATERIALIZED VIEW daily_sales
  COMMENT 'Daily sales numbers'
  SCHEDULE EVERY 1 DAY
  AS SELECT date AS date, sum(sales) AS sumOfSales
       FROM table1
       GROUP BY date;

-- Sets the runtime channel to "PREVIEW"
> CREATE MATERIALIZED VIEW mv_preview
  TBLPROPERTIES(pipelines.channel = "PREVIEW")
  AS SELECT * FROM RANGE(10)

-- Create a materialized view with a table constraint
> CREATE MATERIALIZED VIEW IF NOT EXISTS subscribed_movies(
    member_id int NOT NULL,
    full_name string,
    movie_title string,
    CONSTRAINT movie_pk PRIMARY KEY(member_id)
  )
  AS SELECT mo.member_id, mb.full_name, mo.movie_title
       FROM movies AS mo INNER JOIN members AS mb ON mo.member_id = mb.id;

-- Create or replace the materialized view to remove the table constraint and add a partition
> CREATE OR REPLACE MATERIALIZED VIEW subscribed_movies
  PARTITIONED BY (member_id)
  AS SELECT mo.member_id, mb.full_name, mo.movie_title
       FROM movies AS mo INNER JOIN members AS mb ON mo.member_id = mb.id;

-- Create a materialized view with a row filter and a column mask
> CREATE MATERIALIZED VIEW masked_view (
    id int,
    name string,
    region string,
    ssn string MASK catalog.schema.ssn_mask_fn
  )
  WITH ROW FILTER catalog.schema.us_filter_fn ON (region)
  AS SELECT id, name, region, ssn
       FROM employees;

REFRESH

Syntax​

Parameters​

Required permissions​

Row filters and column masks​

Managing Row Filters and Column Masks​

Behavior​

Observability​

Limitations​

Examples​

Related articles​