Skip to main content

Metric view YAML reference

Preview

This feature is in Public Preview.

This page describes each component of the YAML used to define a metric view.

YAML overview

The YAML definition for a metric view includes six top-level fields:

  • version: Defaults to 0.1. This is the version of the metric view specification.
  • source: The source data for the metric view. This can be a table-like asset or a SQL query.
  • joins: Optional. Used to LEFT JOIN the fact table defined under source with dimension tables as a star schema model.
  • filter: Optional. A SQL boolean expression that applies to all queries; equivalent to the WHERE clause.
  • dimensions: An array of dimension definitions, including the dimension name and expression.
  • measures: An array of aggregate expression columns.

General notation

Metric view definitions follow standard YAML notation syntax. See yaml.org's documentation to learn more about YAML specifications.

When referencing column names with spaces or special characters in expressions in a YAML definition, consider the following:

  • Backtick-escaping: Enclose the column name in backticks (`) to ensure proper referencing. For example, to reference a column with a space in its name, use `column name`.

  • Backtick-escaping surrounded by double quotation marks: If the expression starts with a backtick-escaped column name and needs to be part of a string in YAML, wrap the entire expression in double quotation marks to be YAML compliant. For example, when using a column name with spaces, the whole expression might look like "`column name`".

Source

You can use a table-like asset or a SQL query as the source for your metric view. To use a table-like asset, you must have at least SELECT privileges on the asset.

Use a table as a source

To use a table as a source, include the fully-qualified table name, as in the following example.

source: samples.tpch.orders

Use a SQL query as a source

To use a SQL query, write the query text directly in the YAML.

source: SELECT * FROM samples.tpch.orders o
LEFT JOIN samples.tpch.customer c
ON o.o_custkey = c.c_custkey

note

When using a SQL query as a source with a JOIN clause, Databricks recommends setting primary and foreign key constraints on underlying tables and using the RELY option for optimal performance at query time, if applicable. For more information about using primary and foreign key constraints, see Declare primary key and foreign key relationships and Query optimization using primary key constraints.

Use metric view as a source

You can also use an existing metric view as the source for a new metric view:

version: 0.1
source: views.examples.source_metric_view

dimensions:

# Dimension referencing dimension from source_metric_view
- name: Order date
expr: order_date_dim

measures:

# Measure referencing dimension from source_metric_view
- name: Latest order month
expr: MAX(order_date_dim_month)

# Measure referencing measure from source_metric_view
- name: Latest order year
expr: DATE_TRUNC('year', MEASURE(max_order_date_measure))

When using a metric view as a source:

  • Dimensions in the new metric view can reference any dimension in the source metric view.

  • Measures in the new metric view can reference any dimension or measure in the source metric view.

All other composability rules apply. See Composability.

Filter

A filter in the YAML definition of a metric view applies to all queries that reference it. It must be written as a SQL boolean expression and is equivalent to using a WHERE clause in a SQL query.

Joins

Joins are used to model star schemas, where the source is treated as a fact table and can be joined with multiple dimension tables using a LEFT OUTER JOIN. You can specify join columns using either an on or using clause. The on clause lets you define the join condition with a boolean expression. The using clause references columns with the same names in both tables.

The join should follow a many-to-one relationship. The following example shows how to express a join in the definition of a metric view.

source: catalog.schema.fact_table

joins:

# The on clause supports a boolean expression
- name: dimension_table_1
source: catalog.schema.dimension_table_1
on: source.dimension_table_1_fk = dimension_table_1.pk

# The using clause supports an array of columns
# found in both of the tables being joined.
- name: dimension_table_2
source: catalog.schema.dimension_table_2
using:
- dimension_table_2_key_a
- dimension_table_2_key_b

dimensions:

# Dimension referencing a join column from dimension_table_1 using dot notation
- name: Dimension table 1 key
expr: dimension_table_1.pk

measures:

# Measure referencing a join column from dimension_table_1
- name: Count of dimension table 1 keys
expr: COUNT(dimension_table_1.pk)

note

The source namespace references columns from the metric view’s source, while the join name refers to columns from the joined table. For example, in the join condition source.dimension_table_1_fk = dimension_table_1.pk, source refers to the metric view’s source table (fact_table), and dimension_table_1 refers to the joined table. The reference defaults to the join table if no prefix is provided in an on clause.

Dimensions

Dimensions are used in SELECT, WHERE, and GROUP BY clauses at query time. Each expression must return a scalar value. Each dimension consists of two components:

  • name: The alias of the column.

  • expr: A SQL expression on the source data that defines the dimension.

The following example demonstrates how to define dimensions:

dimensions:

# Column name
- name: Order date
expr: o_orderdate

# SQL expression
- name: Order month
expr: DATE_TRUNC('MONTH', `Order date`)

# Referring to a column with a space in the name
- name: Month of order
expr: `Order month`

# Multi-line expression
- name: Order status
expr: CASE
WHEN o_orderstatus = 'O' THEN 'Open'
WHEN o_orderstatus = 'P' THEN 'Processing'
WHEN o_orderstatus = 'F' THEN 'Fulfilled'
END

Measures

Measures are an array of aggregate expressions that define aggregated results without a pre-determined level of aggregation. They must be expressed as aggregate functions. To reference a measure in a query, you must use the MEASURE function. Each measure consists of the following components:

  • name: The alias of the measure.

  • expr: An aggregate SQL expression that can include SQL aggregate functions.

See Aggregate functions for a list of aggregate functions.

See measure aggregate function.

The following example demonstrates how to define measures:

measures:

# Basic aggregation
- name: Total revenue
expr: SUM(o_totalprice)

# Basic aggregation with ratio
- name: Total revenue per customer
expr: SUM(`Total revenue`) / COUNT(DISTINCT o_custkey)

# Measure-level filter
- name: Total revenue for open orders
expr: COUNT(o_totalprice) FILTER (WHERE o_orderstatus='O')

# Measure-level filter with multiple aggregate functions
# filter needs to be specified for each aggregate function in the expression
- name: Total revenue per customer for open orders
expr: SUM(o_totalprice) FILTER (WHERE o_orderstatus='O')/COUNT(DISTINCT o_custkey) FILTER (WHERE o_orderstatus='O')

Window measures

Beta

This feature is in Beta.

Window measures enable you to define measures with windowed, cumulative, or semiadditive aggregations in your metric views. These types of measures allow for more complex calculations, such as moving averages, period-over-period changes, and running totals. See Use window measures in metric views for examples that demonstrate how to use window measures in metric views.

Composability

Metric views are composable, allowing you to build complex logic by referencing previously defined elements.

In a metric view definition:

  • Dimensions can reference dimensions previously defined in the YAML.
  • Measures can reference all dimensions.
  • Measures can reference measures previously defined using the MEASURE() function.

The following example shows how dimensions and measures can be composed:

dimensions:

# Dimension referencing a source column
- name: Order month
expr: DATE_TRUNC('month', o_orderdate)

# Dimension referencing a previously defined dimension
- name: Previous order month
expr: ADD_MONTHS(`Order Month`, -1)

measures:

# Measure referencing a dimension
- name: Earliest order month
expr: MIN(`Order month`)

# Measure referencing a source column
- name: Revenue
expr: SUM(sales_amount)

# Measure referencing a source column
- name: Costs
expr: SUM(item_cost)

# Measure referencing previously defined measures
- name: Profit
expr: MEASURE(Revenue) - MEASURE(Costs)