Choose a materialization type for metric views

This page describes how to select between aggregated and unaggregated materializations for metric views based on your query patterns. For what each type is and how it works, see Types of materializations for metric views.

Use the following table to find the right approach for your situation. The sections that follow answer each question in detail.

Your situation	Approach
You run the same query patterns often and know which dimensions you group by.	Aggregated materialization
You query a non-additive measure, such as `COUNT(DISTINCT)`, at a fixed grain.	Aggregated materialization with dimensions that match the query's `GROUP BY`
You run ad hoc queries over joined or filtered data and can't predict the `GROUP BY`.	Unaggregated materialization
You have both predictable dashboards and ad hoc queries on the same metric view.	Both types together
Your metric view points at a single table with no joins or filters.	Neither; use aggregated materializations for known patterns, or skip materialization

Your situation	Approach
You run the same query patterns often and know which dimensions you group by.	Aggregated materialization
You query a non-additive measure, such as `COUNT(DISTINCT)`, at a fixed grain.	Aggregated materialization with dimensions that match the query's `GROUP BY`
You run ad hoc queries over joined or filtered data and can't predict the `GROUP BY`.	Unaggregated materialization
You have both predictable dashboards and ad hoc queries on the same metric view.	Both types together
Your metric view points at a single table with no joins or filters.	Neither; use aggregated materializations for known patterns, or skip materialization

Aggregated materializations

An aggregated materialization is a pre-built answer table for a specific type of question. It serves matching queries faster by returning pre-computed results instead of scanning source data.

The following examples use a metric view on sales data with the fields region, category, and order_date, and the measures total_revenue (SUM), order_count (COUNT), and unique_customers (COUNT(DISTINCT)).

How do I speed up a query I run frequently?

Create an aggregated materialization for it. A query you run daily is a good candidate, because the materialization returns pre-computed results instead of scanning the source data. For example, suppose you run this query every morning:

SQL
SELECT region, MEASURE(total_revenue) FROM sales_mv GROUP BY ALL

If you commonly query region and order_date together, include both fields in one materialization:

YAML
- name: revenue_by_region_date
  type: aggregated
  dimensions:
    - region
    - order_date
  measures:
    - total_revenue
    - order_count

Materializing at a finer grain (region and date instead of region alone) means that any query grouping by region alone, order_date alone, or both can use this materialization. Including additive measures such as order_count lets the same materialization serve queries for those measures, so you don't need to create a separate one for each.

How do I know if an existing materialization covers a new query?

Compare the query's GROUP BY dimensions to the materialization's dimensions. If the materialization doesn't include a dimension you group by, the query can't use it. For example, suppose you want revenue by category, but your only materialization is the revenue_by_region_date example shown earlier. Because it doesn't include category, queries grouping by category fall back to an unaggregated materialization (if one exists) or the source tables.

If you query by category frequently, create a separate materialization for it. If the query is infrequent or already fast enough, don't create one. Each materialization adds storage and refresh cost.

How do I speed up a query with a non-additive measure?

Create an aggregated materialization whose dimensions match the query's GROUP BY exactly. Non-additive measures, such as COUNT(DISTINCT), can't roll up from a finer-grained materialization, so a materialization at a different grain won't help. For example, suppose this query is slow:

SQL
SELECT region, MEASURE(unique_customers) FROM sales_mv GROUP BY ALL

unique_customers uses COUNT(DISTINCT), which is non-additive. The revenue_by_region_date materialization shown earlier has different dimensions, so it can't serve this query. Create a materialization with dimensions that match:

YAML
- name: customers_by_region
  type: aggregated
  dimensions:
    - region
  measures:
    - unique_customers

Unaggregated materializations

An unaggregated materialization is a pre-built starting point, not a pre-built answer. It performs the expensive work of joining tables and applying filters once, so queries can aggregate from the joined result instead of re-joining the source tables on every run.

Aggregation still happens at query time, so unaggregated materializations are not as fast as aggregated ones. They are faster than re-joining from the raw source tables on every query.

The following examples use a metric view that joins three tables and applies a filter:

YAML
source: raw_events
filter: event_type = 'purchase'
joins:
  - name: customers
    source: dim_customers
    on: customers.id = source.customer_id
  - name: products
    source: dim_products
    on: products.id = source.product_id

Which type should I use for unpredictable query patterns?

Use an unaggregated materialization. When you run ad hoc queries constantly and can't predict the GROUP BY, it's difficult to define aggregated materializations that cover the right fields. An unaggregated materialization sidesteps this problem: it materializes the joined, filtered dataset once, and any query can use it regardless of its shape.

YAML
materialized_views:
  - name: baseline
    type: unaggregated

Should I materialize a single table with no joins?

An unaggregated materialization on a single table with no joins or filters duplicates the table with no benefit. Use aggregated materializations for known query patterns, or skip materialization entirely.

Can I use both materialization types together?

Yes. Use an unaggregated materialization as a fallback, and aggregated materializations for your known high-traffic queries. This pattern fits a metric view that has expensive joins along with a dashboard of known widgets. Query rewrite prefers aggregated materializations (exact or rollup match) when possible and falls back to unaggregated for everything else.

YAML
materialized_views:
  - name: baseline
    type: unaggregated
  - name: revenue_by_region_date
    type: aggregated
    dimensions:
      - region
      - order_date
    measures:
      - total_revenue

When creating materializations, target your slowest or highest-traffic queries first. Add more materializations when you observe queries falling back to source. To check whether a query uses a materialization, see Verify a query is using materialized views.

Aggregated materializations​

How do I speed up a query I run frequently?​

How do I know if an existing materialization covers a new query?​

How do I speed up a query with a non-additive measure?​

Unaggregated materializations​

Which type should I use for unpredictable query patterns?​

Should I materialize a single table with no joins?​

Can I use both materialization types together?​