CREATE MATERIALIZED VIEW (Lakeflow Declarative Pipelines)

A materialized view is a view where precomputed results are available for query and can be updated to reflect changes in the input. materialized views are backed by Lakeflow Declarative Pipelines. Each time a materialized view is updated, query results are recalculated to reflect changes in upstream datasets. You can updated materialized views manually or on a schedule.

To learn more about how to perform or schedule updates, see Run an update in Lakeflow Declarative Pipelines.

Syntax

CREATE [OR REFRESH] [PRIVATE] MATERIALIZED VIEW
  view_name
  [ column_list ]
  [ view_clauses ]
  AS query

column_list
   ( { column_name column_type column_properties } [, ...]
    [ column_constraint ] [, ...]
    [ , table_constraint ] [...] )

   column_properties
      { NOT NULL | COMMENT column_comment | column_constraint | MASK clause } [ ... ]

view_clauses
  { USING DELTA |
    PARTITIONED BY (col [, ...]) |
    CLUSTER BY clause |
    LOCATION path |
    COMMENT view_comment |
    TBLPROPERTIES clause |
    WITH { ROW FILTER clause } } [...]

Parameters

REFRESH

If specified, will create the view, or update an existing view and its content.
PRIVATE

Creates a private materialized view. A private materialized view can be useful as an intermediate table within a pipeline that you do not want published to the catalog.
- They are not added to the catalog and are only accessible within the defining pipeline
- They can have the same name as an existing object in the catalog. Within the pipeline, if a private materialized view and an object in the catalog have the same name, references to the name will resolve to the private materialized view.
- Private materialized views are only persisted across the lifetime of the pipeline, not just a single update.

Private materialized views were previously created with the TEMPORARY parameter.

view_name

The name of the newly created view. The fully qualified view name must be unique.

Private materialized views can have the same name as an object that is published in the catalog.
column_list

Optionally labels the columns in the query result of the view. If you provide a column list the number of column aliases must match the number of expressions in the query. If no column list is specified, aliases are derived from the body of the view.
- column_name
  
  The column names must be unique and map to the output columns of the query.
- column_type
  
  Specifies the column's data type. Not all data types supported by Databricks are supported by materialized views.
- column_comment
  
  An optional STRING literal describing the column. This option must be specified along with column_type. If the column type is not specified, the column comment is skipped.
- column_constraint
  
  Adds a constraint that validates data as it flows into the table. See Manage data quality with pipeline expectations.
- MASK clause
  
  Preview
  This feature is in Public Preview.
  
  Adds a column mask function to anonymize sensitive data. See Row filters and column masks.
table_constraint

Preview
This feature is in Public Preview.

When specifying a schema, you can define primary and foreign keys. The constraints are informational and are not enforced. See the CONSTRAINT clause in the SQL language reference.

note
To define table constraints, your pipeline must be a Unity Catalog-enabled pipeline.
view_clauses

Optionally specify partitioning, comments, and user defined properties for the materialized view. Each sub clause may only be specified once.
- USING DELTA
  
  Specifies the data format. The only option is DELTA.
  
  This clause is optional, and defaults to DELTA.
- PARTITIONED BY
  
  An optional list of one or more columns to use for partitioning in the table. Mutually exclusive with CLUSTER BY.
  
  Liquid clustering provides a flexible, optimized solution for clustering. Consider using CLUSTER BY instead of PARTITIONED BY for Lakeflow Declarative Pipelines.
- CLUSTER BY
  
  Enable liquid clustering on the table and define the columns to use as clustering keys. Mutually exclusive with PARTITIONED BY.
  
  See Use liquid clustering for tables.
- LOCATION
  
  An optional storage location for table data. If not set, the system will default to the pipeline storage location.
  
  This option is only available when publishing to Hive metastore. In Unity Catalog, the location is managed automatically.
- COMMENT
  
  An optional description for the table.
- TBLPROPERTIES
  
  An optional list of table properties for the table.
- WITH ROW FILTER
Preview
This feature is in Public Preview.

Adds a row filter function to the table. Future queries for that table receive a subset of the rows for which the function evaluates to TRUE. This is useful for fine-grained access control, because it allows the function to inspect the identity and group memberships of the invoking user to decide whether to filter certain rows.

See ROW FILTER clause.
query

A Lakeflow Declarative Pipelines query that defines the dataset for the table.

Required permissions

The run-as user for a pipeline must have the following permissions:

SELECT privilege over the base tables referenced by the materialized view.
USE CATALOG privilege on the parent catalog and the USE SCHEMA privilege on the parent schema.
CREATE MATERIALIZED VIEW privilege on the schema for the materialized view.

For a user to be able to update the pipeline the materialized view is defined within, they require:

USE CATALOG privilege on the parent catalog and the USE SCHEMA privilege on the parent schema.
Ownership of the materialized view or REFRESH privilege on the materialized view.
The owner of the materialized view must have the SELECT privilege over the base tables referenced by the materialized view.

For a user to be able to query the resulting materialized view, they require:

USE CATALOG privilege on the parent catalog and the USE SCHEMA privilege on the parent schema.
SELECT privilege over the materialized view.

Limitations

When a materialized view with a sum aggregate over a NULL-able column has the last non-NULL value removed from that column - and thus only NULL values remain in that column - the materialized view's resultant aggregate value returns zero instead of NULL.
Column-reference does not require an alias. Non-column reference expressions require an alias, as in the following example:
- Allowed: SELECT col1, SUM(col2) AS sum_col2 FROM t GROUP BY col1
- Not Allowed: SELECT col1, SUM(col2) FROM t GROUP BY col1
NOT NULL must be manually specified along with PRIMARY KEY in order to be a valid statement.
Materialized views do not support identity columns or surrogate keys.
Materialized views do not support OPTIMIZE and VACUUM commands. Maintenance happens automatically.
Renaming the table or changing the owner is not supported.
Generated columns, identity columns, and default columns are not supported.

Examples

SQL
-- Create a materialized view by reading from an external data source, using the default schema:
CREATE OR REFRESH MATERIALIZED VIEW taxi_raw
AS SELECT * FROM read_files("/databricks-datasets/nyctaxi/sample/json/")

-- Create a materialized view by reading from a dataset defined in a pipeline:
CREATE OR REFRESH MATERIALIZED VIEW filtered_data
AS SELECT
  ...
FROM taxi_raw

-- Specify a schema and clustering columns for a table:
CREATE OR REFRESH MATERIALIZED VIEW sales
(customer_id STRING,
  customer_name STRING,
  number_of_line_items STRING,
  order_datetime STRING,
  order_number LONG,
  order_day_of_week STRING GENERATED ALWAYS AS (dayofweek(order_datetime))
) CLUSTER BY (order_day_of_week, customer_id)
COMMENT "Raw data on sales"
AS SELECT * FROM ...

-- Specify partition columns for a table:
CREATE OR REFRESH MATERIALIZED VIEW sales
(customer_id STRING,
  customer_name STRING,
  number_of_line_items STRING,
  order_datetime STRING,
  order_number LONG,
  order_day_of_week STRING GENERATED ALWAYS AS (dayofweek(order_datetime))
) PARTITIONED BY (order_day_of_week)
COMMENT "Raw data on sales"
AS SELECT * FROM ...

-- Specify a primary and foreign key constraint for a table:
CREATE OR REFRESH MATERIALIZED VIEW sales
(customer_id STRING NOT NULL PRIMARY KEY,
  customer_name STRING,
  number_of_line_items STRING,
  order_datetime STRING,
  order_number LONG,
  order_day_of_week STRING GENERATED ALWAYS AS (dayofweek(order_datetime)),
  CONSTRAINT fk_customer_id FOREIGN KEY (customer_id) REFERENCES main.default.customers(customer_id)
)
COMMENT "Raw data on sales"
AS SELECT * FROM ...

-- Specify a row filter and mask clause for a table:
CREATE OR REFRESH MATERIALIZED VIEW sales (
  customer_id STRING MASK catalog.schema.customer_id_mask_fn,
  customer_name STRING,
  number_of_line_items STRING COMMENT 'Number of items in the order',
  order_datetime STRING,
  order_number LONG,
  order_day_of_week STRING GENERATED ALWAYS AS (dayofweek(order_datetime))
)
COMMENT "Raw data on sales"
WITH ROW FILTER catalog.schema.order_number_filter_fn ON (order_number)
AS SELECT * FROM sales_bronze

Syntax​

Parameters​

Required permissions​

Limitations​

Examples​

Syntax

Parameters

Required permissions

Limitations

Examples