Materialization for metric views

Experimental

This article explains how to use materialization for metric views to accelerate query performance.

Materialization for metric views accelerates queries by using materialized views. Lakeflow Spark Declarative Pipelines orchestrates user-defined materialized views for a given metric view. At query time, the query optimizer intelligently routes user queries on the metric view to the best materialized view using automatic aggregate-aware query matching, also known as query rewriting.

This approach provides the benefits of pre-computation and automatic incremental updates, without requiring you to determine which aggregation table or materialized view to query for different performance goals and eliminates the need to manage separate production pipelines.

Overview

The following diagram illustrates how metric views handle definition and query execution:

Metric views materialization definition and query execution

Definition phase

When you define a metric view with materialization, CREATE METRIC VIEW or ALTER METRIC VIEW specifies your dimensions, measures, and refresh schedule. Databricks creates a managed Lakeflow Spark Declarative Pipeline that maintains the materialized views.

Query execution

When you run SELECT ... FROM <metric_view>, the query optimizer uses aggregate-aware query rewriting to optimize performance:

Fast path: Reads from pre-computed materialized views when applicable.
Fallback path: Reads from source data directly when materializations are unavailable.

The query optimizer automatically balances performance and freshness by choosing between materialized and source data. You receive results transparently regardless of which path is used.

Requirements

To use materialization for metric views:

Your workspace must have serverless enabled. This is required to run Lakeflow Spark Declarative Pipelines.
Databricks Runtime 17.2 or above.

Configuration reference

All information related to materialization is defined in a top-level field named materialization in the metric view YAML definition.

The materialization field contains the following required fields:

schedule: Supports the same syntax as the schedule clause on materialized views.
mode: Must be set to relaxed.
materialized_views: A list of materialized views to materialize.
- name: The name of the materialization.
- dimensions: A list of dimensions to materialize. Only direct references to dimension names are allowed; expressions are not supported.
- measures: A list of measures to materialize. Only direct references to measure names are allowed; expressions are not supported.
- type: Specifies whether the materialized view is aggregated or not. Accepts two possible values: aggregated and unaggregated.
  - If type is aggregated, there must be at least one dimension or measure.
  - If type is unaggregated, no dimension or measures should be defined.

note

The TRIGGER ON UPDATE clause is not supported for materialization for metric views.

Example definition

YAML
version: 0.1

source: prod.operations.orders_enriched_view

filter: revenue > 0

dimensions:
  - name: category
    expr: substring(category, 5)

  - name: color
    expr: color

measures:
  - name: total_revenue
    expr: SUM(revenue)

  - name: number_of_suppliers
    expr: COUNT(DISTINCT supplier_id)

materialization:
  schedule: every 6 hours
  mode: relaxed

  materialized_views:
    - name: baseline
      type: unaggregated

    - name: revenue_breakdown
      type: aggregated
      dimensions:
        - category
        - color
      measures:
        - total_revenue

    - name: suppliers_by_category
      type: aggregated
      dimensions:
        - category
      measures:
        - number_of_suppliers

Mode

In relaxed mode, automatic query rewrite only verifies if candidate materialized views have the necessary dimensions and measures to serve the query.

This means that several checks are skipped:

There are no checks on whether the materialized view is up to date.
There are no checks on whether you have matching SQL settings (for example, ANSI_MODE or TIMEZONE).
There are no checks on whether the materialized view returns deterministic results.

The following checks are performed:

Whether row-level security (RLS) or column-level masking (CLM) are used in materialized views
Whether current_timestamp()-like functions are used in materialized views

note

During the experimental release period, relaxed is the only supported mode. If these checks fail, the query falls back to the source data.

Types of materializations for metric views

The following sections explain the types of materialized views available for metric views.

Aggregated type

This type pre-computes aggregations for specified measure and dimension combinations for targeted coverage.

This is useful for targeting specific common aggregation query patterns or widgets. Databricks recommends including potential filter columns (columns used at query time in the WHERE clause) as dimensions in the materialized view configuration.

Unaggregated type

This type materializes the entire unaggregated data model (for example, the source, join, and filter fields) for wider coverage with less performance lift compared to the aggregated type.

Use this type when the following are true:

The source is an expensive view or SQL query.
Joins defined in your metric view are expensive.

note

If your source is a direct table reference without a selective filter, an unaggregated materialized view might not provide benefits.

Materialization lifecycle

CREATE and ALTER VIEW behavior

The CREATE and ALTER commands for metric views create or alter the metric view itself synchronously. However, any specified materialized views materialize asynchronously using Lakeflow Spark Declarative Pipelines.

For the CREATE command, Databricks creates a Lakeflow Spark Declarative Pipelines pipeline and schedules an initial update immediately if there are materialized views specified. The metric view remains queryable without materializations by falling back to querying from the source data.

For the ALTER command, no new updates are scheduled, unless you are enabling materialization for the first time. Materialized views are not used for automatic query rewrite until the next scheduled update is complete.

Changing the materialization schedule does not trigger a refresh.

See Manual refresh for finer control over refresh behavior.

Inspect underlying pipeline

Materialization for metric views is implemented using Lakeflow Spark Declarative Pipelines. A link to the pipeline is present in the Overview tab in Catalog Explorer. To learn how to access Catalog Explorer, see What is Catalog Explorer?.

You can also access this pipeline by running DESCRIBE EXTENDED on the metric view. The Refresh Information section contains a link to the pipeline.

SQL
DESCRIBE EXTENDED my_metric_view;

Example output:

SQL
-- Returns additional metadata such as parent schema, owner, access time etc.
> DESCRIBE TABLE EXTENDED customer;
                     col_name                      data_type    comment
 ---------------------------- ------------------------------ ----------
                      ...                             ...         ...

 # Detailed Table Information
                    ...                               ...

    Language              YAML
    Table properties      ...
 # Refresh information
    Latest Refresh status Succeeded
    Latest Refresh  https://xyz.cloud.databricks.com/#joblist/pipelines/9a594074-e3bc-4487-be68-37fed45ec41e/updates/ef019ede-1585-45f5-a46b-da081cc785d2

Manual refresh

From the link to the Lakeflow Spark Declarative Pipelines page, you can manually start a pipeline update to update the materializations. You can also orchestrate this using an API call based on the pipeline ID.

For example, the following Python script starts a pipeline refresh:

Python
from databricks.sdk import WorkspaceClient

client = WorkspaceClient()
pipeline_id = "01484540-0a06-414a-b10f-e1b0e8097f15"
client.pipelines.start_update(pipeline_id)

To execute a manual refresh as part of a Lakeflow job, create a Python script with the above logic, and add it as a task of type Python script. You can alternatively create a notebook with the same logic and add a task of type Notebook.

Incremental refresh

The materialized views use incremental refresh whenever possible, and are subject to the same limitations regarding data sources and plan structure.

For details on prerequisites and restrictions, see Incremental refresh for materialized views.

Automatic query rewrite

Queries to a metric view with materialization attempt to use its materializations as much as possible. There are two query rewrite strategies: exact match and unaggregated match.

Aggregate-aware query rewriting

When you query a metric view, the optimizer analyzes the query and available user-defined materializations. The query automatically runs on the best materialization instead of the base tables using this algorithm:

First attempts an exact match.
If an unaggregated materialization exists, tries an unaggregated match.
If query rewriting fails, the query reads directly from the source tables.

note

Materializations must finish materializing before query rewrite can take effect.

Verify query is using materialized views

To check if a query is using a materialized view, run EXPLAIN EXTENDED on your query to see the query plan. If the query is using materialized views, the leaf node will include __materialization_mat___metric_view and the name of the materialization from the YAML file.

Alternatively, the query profile shows the same information.

Exact match

To be eligible for the exact match strategy, the grouping expressions of the query must precisely match the materialization dimensions. The aggregation expressions of the query must be a subset of the materialization measures.

Unaggregated match

If an unaggregated materialization is available, this strategy is always eligible.

Billing

The refresh of the materialized views is billed under Lakeflow Spark Declarative Pipelines usage.

Known restrictions

The following restrictions apply to materialization for metric views:

A metric view with materialization that refers to another metric view as source cannot have an unaggregated materialization.

Overview​

Definition phase​

Query execution​

Requirements​

Configuration reference​

Example definition​

Mode​

Types of materializations for metric views​

Aggregated type​

Unaggregated type​

Materialization lifecycle​

CREATE and ALTER VIEW behavior​

Inspect underlying pipeline​

Manual refresh​

Incremental refresh​

Automatic query rewrite​

Verify query is using materialized views​

Exact match​

Unaggregated match​

Billing​

Known restrictions​

Overview

Definition phase

Query execution

Requirements

Configuration reference

Example definition

Mode

Types of materializations for metric views

Aggregated type

Unaggregated type

Materialization lifecycle

CREATE and ALTER VIEW behavior

Inspect underlying pipeline

Manual refresh

Incremental refresh

Automatic query rewrite

Verify query is using materialized views

Exact match

Unaggregated match

Billing

Known restrictions