Joins in metric views

Joins in metric views enrich your source data with attributes from related tables. They support direct joins from a fact table to dimension tables (star schema), multi-hop joins across normalized dimension tables (snowflake schema), and one-to-many joins that aggregate facts from related tables. By default, all joins are many-to-one, so each source row matches at most one row in the joined table.

Star schema joins

In a star schema, the source is the fact table and joins with one or more dimension tables using a LEFT OUTER JOIN. Metric views join the fact and dimension tables needed for the specific query, based on the selected fields and measures.

Specify join columns using either an on clause (Boolean expression) or a using clause (shared column names). The join must follow a many-to-one relationship. In cases of many-to-many, the engine selects the first matching row from the joined dimension table.

The following example joins the orders fact table to the customer dimension table with an on clause, which takes a Boolean expression:

YAML
version: 1.1
source: samples.tpch.orders

joins:
  # The on clause supports a Boolean expression
  - name: customer
    source: samples.tpch.customer
    on: source.o_custkey = customer.c_custkey

fields:
  # Field referencing a join column using dot notation
  - name: Customer name
    expr: customer.c_name
  - name: Customer market segment
    expr: customer.c_mktsegment

measures:
  # Measure referencing a join column
  - name: Total revenue
    expr: SUM(o_totalprice)
  - name: Order count
    expr: COUNT(1)

When the join columns have the same name in both tables, use a using clause instead of an on clause. The using clause takes an array of column names that exist in both the source and the joined table. No dataset in the samples catalog has tables that share a join column name, so the following example uses placeholder table and column names to illustrate the syntax:

YAML
joins:
  - name: customer
    source: catalog.schema.customer
    using:
      - customer_id

note

In an on clause, source refers to the metric view's source table and the join name refers to columns from the joined table. For example, source.o_custkey = customer.c_custkey joins the source table's o_custkey column to the customer table's c_custkey column. If no prefix is provided, the reference defaults to the joined table.

Snowflake schema joins

A snowflake schema extends a star schema by normalizing dimension tables and connecting them to subdimensions. This creates a multi-level join structure.

To define a snowflake schema:

Create a metric view.
Add first-level (star schema) joins.
Join with other dimension tables.
Make nested attributes available by adding fields in your view.

The following example uses the TPC-H dataset to illustrate a snowflake schema showing the geographic hierarchy of orders. The example joins the orders table to customers, then to their nations (countries or regions), and finally to their regions (continents). The TPC-H dataset is available in the samples catalog in your Databricks workspace.

YAML
source: samples.tpch.orders

joins:
  - name: customer
    source: samples.tpch.customer
    on: source.o_custkey = customer.c_custkey
    joins:
      - name: nation
        source: samples.tpch.nation
        on: customer.c_nationkey = nation.n_nationkey
        joins:
          - name: region
            source: samples.tpch.region
            on: nation.n_regionkey = region.r_regionkey

fields:
  - name: clerk
    expr: o_clerk
  - name: customer
    expr: customer
    comment: returns the full customer row as a struct
  - name: customer_name
    expr: customer.c_name
  - name: nation
    expr: customer.nation
  - name: nation_name
    expr: customer.nation.n_name

Join cardinality

The cardinality field on a join controls the relationship between the source table and the joined table. This field determines how the engine processes measures that reference columns from the joined table.

The following table compares the two supported cardinalities:

Property	`many_to_one` (default)	`one_to_many`
Rows matched per source row	At most one	Zero or more
Typical use	Dimension lookup	Fact expansion
Allowed in `fields`	Yes	No
Allowed in `measures`	Yes	Yes

Many-to-one joins

Many-to-one is the default cardinality. Each row in the source matches at most one row in the joined table, so the joined table acts as a dimension lookup. You can omit the cardinality field for many-to-one joins, or state cardinality: many_to_one explicitly.

Both fields and measures can reference columns from a many-to-one join using dot notation (for example, customer.c_name).

Declare join constraints with `rely`

Setting rely.at_most_one_match: true declares that the join has no fan-out on the "one" side:

On a many-to-one join, each source row matches at most one row in the joined table.
On a one-to-many join, each joined row matches at most one source row.

This declaration allows the engine to skip unnecessary joins and reduce data scanned, especially for queries that filter on fields from the joined table. Databricks recommends setting rely on both cardinalities when the constraint holds.

warning

Set at_most_one_match: true only when the relationship genuinely holds. This property is not validated at runtime. If the asserted side produces a fan-out, measures return incorrect results.

The following example joins orders to customer with rely enabled:

YAML
version: 1.1
source: samples.tpch.orders

joins:
  - name: customer
    source: samples.tpch.customer
    on: source.o_custkey = customer.c_custkey
    rely:
      at_most_one_match: true

fields:
  - name: Customer name
    expr: customer.c_name
  - name: Customer market segment
    expr: customer.c_mktsegment

measures:
  - name: Total revenue
    expr: SUM(o_totalprice)
  - name: Order count
    expr: COUNT(1)

See Optimize joins with rely for the complete rely field reference.

One-to-many joins

Set cardinality: one_to_many to allow a single source row to match multiple rows in the joined table. This turns that table into a fact source that the engine aggregates independently at the source grain.

note

One-to-many joins require Databricks Runtime 18.1 or above and YAML specification version 1.1. See Metric view feature availability.

A one-to-many join allows a single metric view to measure facts that live at different grains, such as orders per customer or events per account, without duplicating the source rows in query results. The source acts as the dimensional spine: each entity appears exactly once, regardless of how many matching rows exist in the joined table.

One-to-many join example

The following example uses customer as the source and joins orders with cardinality: one_to_many. A many_to_one join to nation supplies the nation_name field. Qualify the source side of each join condition with source. so the reference resolves to the metric view's source table. Both joins set rely.at_most_one_match: true: on the nation join it asserts each customer has at most one nation, and on the orders join it asserts each order belongs to at most one customer. See Declare join constraints with rely.

YAML
version: 1.1
source: samples.tpch.customer

joins:
  - name: nation
    source: samples.tpch.nation
    on: nation.n_nationkey = source.c_nationkey
    rely:
      at_most_one_match: true
  - name: orders
    source: samples.tpch.orders
    on: orders.o_custkey = source.c_custkey
    cardinality: one_to_many
    rely:
      at_most_one_match: true

fields:
  - name: customer_name
    expr: c_name
  - name: nation_name
    expr: nation.n_name

measures:
  - name: customer_count
    expr: count(*)
  - name: order_count
    expr: count(orders.o_orderkey)
  - name: total_order_revenue
    expr: sum(orders.o_totalprice)

In this view, customer_count counts rows in the source customer table, while order_count and total_order_revenue aggregate rows from the orders branch. A customer with two orders returns an order_count of 2 while customer_count stays 1, which confirms that the source rows are not duplicated. A customer with no orders still appears in results, with an order_count of 0 and a NULL total_order_revenue.

Nested one-to-many joins

To measure facts that are two or more levels below the source, nest one-to-many joins. All joins in a one-to-many subtree must share the same cardinality, so a one-to-many parent can't have a many-to-one child. Reference a column in a nested join with its full dot-path through the join names.

The following example nests lineitem under orders so that a single customer-grain view can count both orders and line items:

YAML
version: 1.1
source: samples.tpch.customer

joins:
  - name: orders
    source: samples.tpch.orders
    on: orders.o_custkey = source.c_custkey
    cardinality: one_to_many
    joins:
      - name: lineitem
        source: samples.tpch.lineitem
        on: lineitem.l_orderkey = orders.o_orderkey
        cardinality: one_to_many

fields:
  - name: customer_name
    expr: c_name

measures:
  - name: order_count
    expr: count(distinct orders.o_orderkey)
  - name: line_item_count
    expr: count(orders.lineitem.l_linenumber)
  - name: total_line_revenue
    expr: sum(orders.lineitem.l_extendedprice)

The measures reference nested columns with their full dot-path through the join names, such as orders.lineitem.l_extendedprice, because lineitem is reachable only through orders. Use count(distinct orders.o_orderkey) rather than a plain count for the order count: each order fans out into multiple line items, so a plain count would count an order once per line item.

Sibling one-to-many joins

Define multiple one-to-many joins at the same level to measure independent fact sources from a single view. The engine aggregates sibling joins separately and then blends them, so their rows never cross-multiply. Top-level siblings can mix cardinalities freely, so a many_to_one dimension join and a one_to_many fact join can coexist at the same level.

The following example uses nation as the source and adds two independent one-to-many branches, customer and supplier:

YAML
version: 1.1
source: samples.tpch.nation

joins:
  - name: customer
    source: samples.tpch.customer
    on: customer.c_nationkey = source.n_nationkey
    cardinality: one_to_many
  - name: supplier
    source: samples.tpch.supplier
    on: supplier.s_nationkey = source.n_nationkey
    cardinality: one_to_many

fields:
  - name: nation_name
    expr: n_name

measures:
  - name: customer_count
    expr: count(customer.c_custkey)
  - name: supplier_count
    expr: count(supplier.s_suppkey)
  - name: customers_per_supplier
    expr: count(customer.c_custkey) / count(supplier.s_suppkey)

The customers_per_supplier measure divides two independent aggregations after the engine blends each one to the query grain. You can combine measures from different sources with arithmetic, but a single aggregation function must reference columns from only one source.

Connect multiple fact tables with a bridge table

A metric view models a single fact table joined to dimension tables. To combine measures from two or more fact tables that sit at different grains, define a bridge that enumerates the valid combinations of the dimensions the facts share, directly in the metric view's source. For example, the samples.tpch shipping fact lineitem (grain: order line) and the supply fact partsupp (grain: part and supplier) both share the part and supplier dimensions.

A bridge makes the set of valid dimension combinations explicit, so query results stay predictable. The metric view returns only the combinations you declare valid rather than inferring them for each query. Set cardinality: one_to_many on each fact join so the engine aggregates each fact independently against the shared bridge without fanning out and double-counting.

To build the bridge, define it as a SQL query in the metric view source, join each fact table to it on its shared columns, then declare fields on the shared dimension columns and measures on each fact. Use a CROSS JOIN when every combination of the shared dimensions is valid:

YAML
version: 1.1
source: SELECT * FROM samples.tpch.part CROSS JOIN samples.tpch.supplier
filter: s_suppkey IN (11315, 42920) AND p_partkey IN (30419, 80418)

joins:
  - name: lineitem
    source: samples.tpch.lineitem
    on: source.p_partkey = lineitem.l_partkey AND source.s_suppkey = lineitem.l_suppkey
    cardinality: one_to_many
  - name: partsupp
    source: samples.tpch.partsupp
    on: source.p_partkey = partsupp.ps_partkey AND source.s_suppkey = partsupp.ps_suppkey
    cardinality: one_to_many

fields:
  - name: part_name
    expr: p_name
  - name: part_brand
    expr: p_brand
  - name: part_type
    expr: p_type
  - name: part_size
    expr: p_size
  - name: manufacturer
    expr: p_mfgr
  - name: supplier_name
    expr: s_name

measures:
  - name: lineitem_count
    expr: COUNT(lineitem.*)
  - name: total_quantity_sold
    expr: SUM(lineitem.l_quantity)
  - name: gross_revenue
    expr: SUM(lineitem.l_extendedprice)
  - name: net_revenue
    expr: SUM(lineitem.l_extendedprice * (1 - lineitem.l_discount))
  - name: distinct_orders
    expr: COUNT(DISTINCT lineitem.l_orderkey)
  - name: available_quantity
    expr: SUM(partsupp.ps_availqty)
  - name: avg_supply_cost
    expr: AVG(partsupp.ps_supplycost)
  - name: total_supply_value
    expr: SUM(partsupp.ps_availqty * partsupp.ps_supplycost)

A measure over a fact table counts only the records whose shared-dimension values appear in the bridge. Combinations that the bridge doesn't include don't contribute to the results.

When you only want the combinations that actually occur, swap the source for a UNION (or FULL OUTER JOIN) of the distinct pairs from each fact so that each fact contributes its members. The joins, fields, and measures stay the same:

YAML
source: |
  SELECT DISTINCT l_partkey AS p_partkey, l_suppkey AS s_suppkey FROM samples.tpch.lineitem
  UNION
  SELECT DISTINCT ps_partkey AS p_partkey, ps_suppkey AS s_suppkey FROM samples.tpch.partsupp

One-to-many join restrictions

Fields can't reference a one-to-many join: A field must resolve to exactly one value per source row. Because a one-to-many column can have multiple values per source row, you can't use it in a fields definition. To use such a column as a field, make that table the source and join the original source as a many_to_one join instead.
A single aggregation can't span sources: Each aggregation function must reference columns from one source. Arithmetic between the results of two aggregations is allowed, such as count(orders.o_orderkey) / count(*), but a single function can't combine columns from two sources.
A join subtree can't mix cardinalities: All descendants of a one-to-many join must also be one-to-many, and all descendants of a many-to-one join must be many-to-one. Only top-level siblings can mix cardinalities.

Star schema joins​

Snowflake schema joins​

Join cardinality​

Many-to-one joins​

Declare join constraints with rely​

One-to-many joins​

One-to-many join example​

Nested one-to-many joins​

Sibling one-to-many joins​

Connect multiple fact tables with a bridge table​

One-to-many join restrictions​

Additional resources​