Skip to main content

Model metric views

Metric views create a semantic layer for your data, transforming tables and views into standardized business metrics. They define what to measure, how to aggregate it, and how to segment it. Metric views ensure that every user across the organization reports the same value for the same Key Performance Indicator (KPI), eliminating inconsistent reporting and enabling flexible analysis across any dimensions.

For a full example with joins, dimensions, measures, and agent metadata, see Tutorial: Build a complete metric view with joins.

Core components

A metric view consists of the following elements:

Component

Description

Example

Source

The base table, view, or SQL query containing the data.

samples.tpch.orders

Dimensions

Column attributes used to segment or group metrics.

Product category, Order month, Customer region

Measures

Column aggregations that produce metrics.

COUNT(o_order_id) as Order Count, SUM(o_total_price) as Total Revenue

Filters

Conditions applied to the source data to define scope.

  • status = 'completed'
  • order_date > '2024-01-01'

Joins

Relationships between tables, views, and metric views to enrich data.

Join orders table with customers table on customer_key

Define a source

You can use a table-like asset or a SQL query as the source for your metric view. You must have at least SELECT privileges on any referenced asset.

A table-like asset is any Unity Catalog object that exposes a tabular schema and supports SELECT queries, including tables, views, materialized views, streaming tables, foreign tables, system tables, and metric views.

Use a table-like asset as a source

To use a table-like asset as a source, specify the fully-qualified name. For example: samples.tpch.orders.

Use a metric view as a source

You can use an existing metric view as the source for a new metric view:

YAML
version: 1.1

source: views.examples.source_metric_view

dimensions:
- name: Order month
expr: '`Order Month`'

measures:
- name: Latest order month
expr: MAX(`Order month`)
- name: Latest order year
expr: "DATE_TRUNC('year', MEASURE(`Latest order month`))"

When using a metric view as a source, the same composability rules apply for referencing dimensions and measures. See Composability.

Use a SQL query as a source

To use a SQL query, write the query text directly in the YAML:

YAML
version: 1.1

source: SELECT * FROM samples.tpch.orders o LEFT JOIN samples.tpch.customer c ON o.o_custkey
= c.c_custkey

dimensions:
- name: Order key
expr: o_orderkey

measures:
- name: Order Count
expr: COUNT(o_orderkey)
note

When using a SQL query as a source with a JOIN clause, set primary and foreign key constraints on underlying tables and use the RELY option for optimal query performance. See Declare primary key and foreign key relationships and Query optimization using primary key constraints.

Dimensions

Dimensions are columns used in SELECT, WHERE, and GROUP BY clauses at query time. Each expression must return a scalar value. Dimensions can reference columns from the source data or previously defined dimensions in the metric view. Each dimension consists of two components:

  • name: The alias of the column
  • expr: A SQL expression that references the source data or previously defined dimensions in the metric view

Measures

Measures are expressions that produce results without a pre-determined level of aggregation. They must be expressed using aggregate functions. Measures can reference base fields in the source data, dimensions, or earlier-defined measures. Each measure consists of the following components:

  • name: The alias of the measure
  • expr: An aggregate SQL expression that can include SQL aggregate functions

The following example demonstrates common measure patterns for analyzing order and revenue data. These examples use the TPC-H orders table, which contains sales transaction data including order prices (o_totalprice), customer identifiers (o_custkey), order keys (o_orderkey), order dates (o_orderdate), and priority levels (o_orderpriority):

YAML
measures:
# Simple count measure
- name: Order Count
expr: COUNT(1)

# Sum aggregation measure
- name: Total Revenue
expr: SUM(o_totalprice)

# Distinct count measure
- name: Unique Customers
expr: COUNT(DISTINCT o_custkey)

# Calculated measure combining multiple aggregations
- name: Average Order Value
expr: SUM(o_totalprice) / COUNT(DISTINCT o_orderkey)

# Filtered measure with WHERE condition
- name: High Priority Order Revenue
expr: SUM(o_totalprice) FILTER (WHERE o_orderpriority = '1-URGENT')

# Measure using a dimension
- name: Average Revenue per Month
expr: SUM(o_totalprice) / COUNT(DISTINCT DATE_TRUNC('MONTH', o_orderdate))

See Aggregate functions for a list of aggregate functions.

Apply filters

A filter in the YAML definition applies to all queries that reference the metric view. The following example shows how to write filters as Boolean expressions:

YAML
# Single condition
filter: o_orderdate > '2024-01-01'

# Multiple conditions
filter: o_orderdate > '2024-01-01' AND o_orderstatus = 'F'

# IN clause
filter: o_orderstatus IN ('F', 'P') AND o_orderdate >= '2024-01-01'

Work with joins

Joins in metric views support both direct joins from a fact table to dimension tables (star schema) and multi-hop joins across normalized dimension tables (snowflake schemas).

note

Joined tables can't include MAP type columns. To unpack values from MAP type columns, see Explode nested elements from a map or array.

Model star schemas

In a star schema, the source is the fact table and joins with one or more dimension tables using a LEFT OUTER JOIN. Metric views join the fact and dimension tables needed for the specific query, based on the selected dimensions and measures.

Specify join columns using either an on clause (boolean expression) or a using clause (shared column names). The join should follow a many-to-one relationship. In cases of many-to-many, the first matching row from the joined dimension table is selected.

The following example joins orders (fact table) to customer (dimension table) and exposes customer attributes as dimensions:

YAML
version: 1.1
source: samples.tpch.orders

joins:
- name: customer
source: samples.tpch.customer
on: source.o_custkey = customer.c_custkey

dimensions:
- name: Customer name
expr: customer.c_name
- name: Customer market segment
expr: customer.c_mktsegment

measures:
- name: Total revenue
expr: SUM(o_totalprice)
- name: Order count
expr: COUNT(1)

Use the using clause instead of on when both tables share a column name:

YAML
joins:
- name: customer
source: samples.tpch.customer
using:
- customer_key
note

In an on clause, source refers to the metric view's source table and the join name refers to columns from the joined table. For example, source.o_custkey = customer.c_custkey joins the source table's o_custkey column to the customer table's c_custkey column. If no prefix is provided, the reference defaults to the joined table.

Model snowflake schemas

A snowflake schema extends a star schema by normalizing dimension tables and connecting them to subdimensions. This creates a multi-level join structure.

To define a snowflake schema:

  1. Create a metric view
  2. Add first-level (star schema) joins
  3. Join with other dimension tables
  4. Expose nested dimensions by adding dimensions in your view

The following example uses the TPC-H dataset to illustrate a snowflake schema showing the geographic hierarchy of orders. The example joins the orders table to customers, then to their nations (countries), and finally to their regions (continents). This multi-level join structure enables analysis like "show revenue by region" or "compare customer distribution across nations". The TPC-H dataset is available in the samples catalog in your Databricks workspace.

YAML
source: samples.tpch.orders

joins:
- name: customer
source: samples.tpch.customer
on: source.o_custkey = customer.c_custkey
joins:
- name: nation
source: samples.tpch.nation
on: customer.c_nationkey = nation.n_nationkey
joins:
- name: region
source: samples.tpch.region
on: nation.n_regionkey = region.r_regionkey

dimensions:
- name: clerk
expr: o_clerk
- name: customer
expr: customer
comment: returns the full customer row as a struct
- name: customer_name
expr: customer.c_name
- name: nation
expr: customer.nation
- name: nation_name
expr: customer.nation.n_name

YAML syntax and formatting

Metric view definitions follow standard YAML notation syntax. See Metric view YAML syntax reference for the required syntax and formatting.

Best practices

Use the following guidelines when modeling metric views:

  • Model atomic measures: Start by defining the simplest measures first (for example, SUM(revenue), COUNT(DISTINCT customer_id)). Build complex measures using composability.
  • Standardize dimension values: Use transformations (such as CASE statements) to convert database codes into clear business names (for example, convert order status 'O' to 'Open' and 'F' to 'Fulfilled').
  • Define scope with filters: If a metric view should only include completed orders, define that filter in the metric view so users can't accidentally include incomplete data.
  • Use clear naming: Metric names should be recognizable to business users (for example, "Customer Lifetime Value" instead of cltv_agg_measure).
  • Separate time dimensions: Include granular time dimensions (such as "Order Date") and truncated time dimensions (such as "Order Month" or "Order Week") to support both detail-level and trend analysis.

Next steps