Choose a materialization type for metric views
This page describes how to choose between aggregated and unaggregated materializations for metric views based on your query patterns. For what each type is and how it works, see Types of materializations for metric views.
Use the following table to find the right approach for your situation. The sections that follow answer each question in detail.
Your situation | Approach |
|---|---|
You run the same query patterns often and know which dimensions you group by. | |
You query a non-additive measure, such as | Aggregated materialization with dimensions that match the query's |
You run ad hoc queries over joined or filtered data and can't predict the | |
You have both predictable dashboards and ad hoc queries on the same metric view. | |
Your metric view points at a single table with no joins or filters. | Neither; use aggregated materializations for known patterns, or skip materialization |
Aggregated materializations
An aggregated materialization is a pre-built answer table for a specific type of question. It serves matching queries faster by returning pre-computed results instead of scanning source data.
The following examples use a metric view on sales data with the fields region, category, and order_date, and the measures total_revenue (SUM), order_count (COUNT), and unique_customers (COUNT(DISTINCT)).
How do I speed up a query I run frequently?
Create an aggregated materialization for it. A query you run daily is a good candidate, because the materialization returns pre-computed results instead of scanning the source data. For example, suppose you run this query every morning:
SELECT region, MEASURE(total_revenue) FROM sales_mv GROUP BY ALL
If you commonly query region and order_date together, include both fields in one materialization:
- name: revenue_by_region_date
type: aggregated
dimensions:
- region
- order_date
measures:
- total_revenue
- order_count
Materializing at a finer grain (region and date instead of region alone) means that any query grouping by region alone, order_date alone, or both can use this materialization. Including additive measures such as order_count lets the same materialization serve queries for those measures, so you don't need to create a separate one for each.
How do I know if an existing materialization covers a new query?
Compare the query's GROUP BY dimensions to the materialization's dimensions. If the materialization doesn't include a dimension you group by, the query can't use it. For example, suppose you want revenue by category, but your only materialization is the revenue_by_region_date example shown earlier. Because it doesn't include category, queries grouping by category fall back to an unaggregated materialization (if one exists) or the source tables.
If you query by category frequently, create a separate materialization for it. If the query is infrequent or already fast enough, don't create one. Each materialization adds storage and refresh cost.
How do I speed up a query with a non-additive measure?
Create an aggregated materialization whose dimensions match the query's GROUP BY exactly. Non-additive measures, such as COUNT(DISTINCT), can't roll up from a finer-grained materialization, so a materialization at a different grain won't help. For example, suppose this query is slow:
SELECT region, MEASURE(unique_customers) FROM sales_mv GROUP BY ALL
unique_customers uses COUNT(DISTINCT), which is non-additive. The revenue_by_region_date materialization shown earlier has different dimensions, so it can't serve this query. Create a materialization with dimensions that match:
- name: customers_by_region
type: aggregated
dimensions:
- region
measures:
- unique_customers
Unaggregated materializations
An unaggregated materialization is a pre-built starting point, not a pre-built answer. It performs the expensive work of joining tables and applying filters once, so queries can aggregate from the joined result instead of re-joining the source tables on every run.
Aggregation still happens at query time, so unaggregated materializations are not as fast as aggregated ones. They are faster than re-joining from the raw source tables on every query.
The following examples use a metric view that joins three tables and applies a filter:
source: raw_events
filter: event_type = 'purchase'
joins:
- name: customers
source: dim_customers
on: customers.id = source.customer_id
- name: products
source: dim_products
on: products.id = source.product_id
Which type should I use for unpredictable query patterns?
Use an unaggregated materialization. When you run ad hoc queries constantly and can't predict the GROUP BY, it's difficult to define aggregated materializations that cover the right fields. An unaggregated materialization sidesteps this problem: it materializes the joined, filtered dataset once, and any query can use it regardless of its shape.
materialized_views:
- name: baseline
type: unaggregated
Should I materialize a single table with no joins?
An unaggregated materialization on a single table with no joins or filters duplicates the table with no benefit. Use aggregated materializations for known query patterns, or skip materialization entirely.
Can I use both materialization types together?
Yes. Use an unaggregated materialization as a fallback, and aggregated materializations for your known high-traffic queries. This pattern fits a metric view that has expensive joins along with a dashboard of known widgets. Query rewrite prefers aggregated materializations (exact or rollup match) when possible and falls back to unaggregated for everything else.
materialized_views:
- name: baseline
type: unaggregated
- name: revenue_by_region_date
type: aggregated
dimensions:
- region
- order_date
measures:
- total_revenue
When creating materializations, target your slowest or highest-traffic queries first. Add more materializations when you observe queries falling back to source. To check whether a query uses a materialization, see Verify a query is using materialized views.