Skip to main content

What are custom calculations?

Custom calculations let you define dynamic metrics and transformations without modifying dataset queries. This article explains how to use custom calculations in AI/BI dashboards.

Why use custom calculations?

Custom calculations allow you to create and visualize new fields from existing dashboard datasets without modifying the source SQL. You can define two types of custom calculations:

  • Calculated measures: Aggregated values such as total sales or average cost. These can use the AGGREGATE OVER command to calculate values across time ranges.
  • Calculated dimensions: Unaggregated values or transformations such as categorizing age ranges or formatting strings.

Custom calculations behave similarly to metric views, but are scoped to the dataset and dashboard where they are defined. To define custom metrics that can be used with other data assets, see Unity Catalog metric views.

Create dynamic metrics with calculated measures

Suppose you have the following dataset:

Item

Region

Price

Cost

Date

Apples

USA

30

15

2024-01-01

Apples

Canada

20

10

2024-01-01

Oranges

USA

20

15

2024-01-02

Oranges

Canada

15

10

2024-01-02

You want to visualize profit margin by region. Without custom calculations, you would need to create a new dataset with a margin column:

Region

Margin

USA

0.40

Canada

0.43

While this approach works, the new dataset is static and might only support a single visualization. Filters applied to the original dataset do not affect the new dataset without additional manual adjustments.

With custom calculations, you can express the profit margin as an aggregation using the following formula:

SQL
(SUM(Price) - SUM(Cost)) / SUM(Price)

This measure is dynamic. When used in a visualization, it automatically updates to reflect filters applied to the dataset.

Define calculated measures over a range

A common task in dashboard visualizations is to compute a measure, such as SUM(sales) across a range, such as the past 7-days. For dashboard datasets, use the AGGREGATE OVER command to define these types of range-based measure calculations.

The AGGREGATE OVER command can incorporate data outside of the current group or partition. For example, if a visualization groups data by day, a trailing 7-day range allows each day's data point to include data from the previous 6 days. If the same measure is used in a visualization grouped by month, the 7-day range uses only the last 7 days in each month.

Using the same dataset as the previous example, the following expression computes the trailing 7-day average profit margin.

SQL
(
(SUM(Price) - SUM(Cost)) / SUM(Price)
) AGGREGATE OVER (
ORDER BY Date
TRAILING 7 DAY
)

After creation, this measure can be applied in any visualization.

AGGREGATE OVER syntax

The AGGREGATE OVER command requires the following syntax:

{expr} AGGREGATE OVER (ORDER BY {field} {frame})

Arguments

  • expr

    A valid calculated measure expression to evaluate

  • field (required)

    A valid column name

  • frame (required) Can be one of the following:

    • CURRENT
    • CUMULATIVE
    • ALL
    • (TRAILING|LEADING) number unit
      • number is an integer
      • unit is DAY, MONTH, or YEAR
      • example: TRAILING 7 DAY or LEADING 1 MONTH

The following table identifies how the frame specification for aggregate over compares to the equivalent SQL window frame clause.

Frame specification

Equivalent SQL window frame clause

CURRENT

RANGE BETWEEN CURRENT ROW AND CURRENT ROW

ALL

RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

CUMULATIVE

RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

TRAILING <NUMBER> <UNIT>

RANGE BETWEEN <NUMBER> <UNIT> PRECEDING and 1 <UNIT> PRECEDING

LEADING <NUMBER> <UNIT>

RANGE BETWEEN 1 <UNIT> FOLLOWING and <NUMBER> <UNIT> FOLLOWING

AGGREGATE OVER compared to window functions

The AGGREGATE OVER command is similar to window functions in SQL, which are often used to calculate metrics. However, unlike window functions, you can use AGGREGATE OVER without specifying a partition field in the expression. Instead, it inherits partitions from the visualization query's grouping. This means you can use these measures more dynamically. For example, you can calculate a 7-day moving average by product line, by adding the measure to the y-axis of a visualization and product line to the x-axis.

If the order by field is not grouped on in the visualization, AGGREGATE OVER will take the last row's aggregated value as the value to display for each group. If you're familiar with window functions, you might recognize this as similar to a semi-additive measure, where this is equivalent to last semi-additive behavior.

Define unaggregated values with custom dimensions

Calculated dimensions let you define unaggregated values or lightweight transformations without changing the source dataset. This is helpful when you want to organize or reformat data for visualization.

For example, to analyze age trends by age group instead of individual ages, you can define a custom age_group dimension using the following expression:

SQL
CASE
WHEN age < 18 THEN '<18'
WHEN age >= 18 AND age < 25 THEN '18–24'
WHEN age >= 25 AND age < 35 THEN '25–34'
WHEN age >= 35 AND age < 45 THEN '35–44'
WHEN age >= 45 AND age < 55 THEN '45–54'
WHEN age >= 55 AND age < 65 THEN '55–64'
WHEN age >= 65 THEN '65+'
END

Performance benefits

Custom calculations are optimized for performance. For small datasets (≤100,000 rows and ≤100MB), calculations run in the browser for faster responsiveness. Larger datasets are processed by the SQL warehouse. See Dataset optimization and caching for more details.

Create a custom calculation

This example creates a calculated measure based on the samples.nyctaxi.trips dataset. It assumes general knowledge about how to work with AI/BI dashboards. If you are unfamiliar with authoring AI/BI dashboards, see Create a dashboard to get started.

  1. Open an existing dataset or create a new one.

  2. Click Custom Calculation.

    The custom calculation button is highlighted in the upper-right corner of the results panel.

  3. A Create Calculation panel opens on the right side of the screen. In the Name text field, enter Cost per mile.

  4. (Optional) In the Description text field, enter “Uses the fare amount and trip distance to calculate cost per mile.”

  5. In the Expression field, enter the following:

    SQL
    try_divide(SUM(fare_amount), SUM(trip_distance))
  6. Click Create.

The custom calculations editor with the values from instructions filled in.

Add custom calculations to a metric view

Preview

This feature is in Public Preview.

You can define custom calculations on top of a dataset created by a metric view. Only the Results Table and Schema are shown when you open the dataset. Click Custom Calculation to define a new custom calculation. To define additional custom metrics that other data assets can use, make changes to the view definition. See Unity Catalog metric views.

To define a new metric view from the dashboard dataset editor, see Create a metric view.

View the schema

Click the Schema tab in the results panel to view the custom calculation and its associated comment.

Calculated measures are listed in the Measures section and marked by a Calculated measure icon fx. The value associated with a calculated measure is dynamically calculated when you set the GROUP BY in a visualization. You cannot see the value in the results table. Calculated dimensions appear in the Dimensions section.

A calculated measure appears in the schema tab.

Use a custom calculation in a visualization

You can use the previously created Cost per mile calculated measure in a visualization.

  1. Click Canvas. Then, place a new visualization widget on the canvas.
  2. Use the visualization configuration panel to edit the settings as follows:
    • Dataset: Taxicab data
    • Visualization: Bar
    • X axis:
      • Field: dropoff_zip
      • Scale Type: Categorical
      • Transform: None
    • Y axis:
      • Cost per mile
note

Table visualizations support calculated dimensions, but do not support calculated measures.

The following image shows the chart.

A bar chart showing cost per mile versus dropoff zipcode.

Visualizations with custom calculations automatically update when filters are applied. For example, adding a pickup_zip filter will update the visualization to show only data matching the selected values.

Edit a custom calculation

To edit a calculation:

  1. Click the Data tab and then click the dataset associated with the calculation you want to edit.
  2. Click the Schema tab in the results panel.
  3. Measures and Dimensions appear under the list of dataset fields. Click the Kebab menu icon. kebab menu to the right of the calculation you want to edit. Then, click Edit.
  4. In the Edit custom calculation panel, update the text fields that you want to edit. Then, click Update.

Delete a custom calculation

To delete a calculation:

  1. Click the Data tab and then click the dataset associated with the measure you want to edit.
  2. Click the Schema tab in the results panel.
  3. The Measures section appears under the list of fields. Click the Kebab menu icon. kebab menu to the right of the calculation that you want to edit. Then, click Delete.
  4. Click Delete in the Delete dialog that appears.

Limitations

To use custom calculations, the following must be true:

  • Columns used in the expression must belong to the same dataset.
  • Expressions that reference external tables or data sources aren't supported and might fail or return unexpected results.

Supported functions

The following tables list supported functions. Attempting to use an unsupported function results in an error.

Aggregate functions

All calculated measures must be aggregated. The following aggregation operations are supported:

Aggregation

Description

avg(expr) or mean(expr)

Returns the calculated mean in a column or expression.

count(*)

Returns the number of rows in a group.

count(DISTINCT expr)

Returns the number of unique rows in a group.

sum(expr)

Returns the total of values in a column or expression.

max(expr)

Returns the maximum value in a column or expression.

min(expr)

Returns the minimum value in a column or expression.

percentile(expr, percentage [,frequency])

Returns the exact percentile value of expr at the specified percentage in a group.

first(expr [,ignoreNull])

Returns the first value of expr for a group.

last(expr [,ignoreNull])

Returns the last value of expr for the group.

count_if

Returns the count of rows that satisfy a given condition.

median

Returns the median of a set of values.

stddev

Returns the standard deviation of a set of values.

variance

Returns the variance of a set of values.

Arithmetic operations

You can combine expressions with the following arithmetic operations:

Operation

Description

expr1 + expr2

Returns the sum of expr1 and expr2.

expr1 - expr2

Returns the difference when subtracting expr2 from expr1.

multiplier * multiplicand

Returns the product of two expressions.

dividend / divisor

Returns the result of dividing the dividend by the divisor.

- expr

Returns the negated value of the expression.

+ expr

Returns the value of the expression.

try_add(expr1, expr2)

Adds two values. If an error occurs, returns NULL.

try_subtract(expr1, expr2)

Subtracts expr2 from expr1. If an error occurs, returns NULL.

try_multiply(multiplier, multiplicand)

Multiplies two numbers. If an error occurs, returns NULL.

try_divide(dividend, divisor)

Divides the dividend by the divisor. If an error occurs, returns NULL.

pow or power

Returns the result of expr1 raised to the power of expr2.

Boolean functions and operators

Custom calculations support basic comparison and Boolean operators, such as =, ==, <=, >=, <, >, is null, AND, OR, NOT, !. You can also evaluate expressions using the following functions:

Function

Description

isnull(expr)

Returns true if the expr is NULL.

isnotnull(expr)

Returns true if expr is not NULL.

Cast functions

Use the following functions to cast values to a specified type:

Function

Description

cast(expr AS type)

Casts the value expr to the target data type type.

try_cast(expr AS type)

Casts the value expr to the target data type type safely.

Date, timestamp, and interval functions

Use the following functions to work with dates, timestamps, and intervals:

Function

Description

datediff(endDate, startDate)

Returns the number of days from startDate to endDate.

timestampdiff(unit, start, stop)

Returns the difference between two timestamps measured in units.

date_format(expr, fmt)

Converts a timestamp to a string in the format fmt.

timediff(unit, start, stop)

Returns the difference between two timestamps measured in units.

date_part

Extracts a specific part, such as year, month, or day, from a date or timestamp.

date_trunc

Truncates a date or timestamp to a specified unit, such as year or month.

String functions

Use the following functions to transform strings:

Function

Description

concat(expr1, expr2[, …])

Returns the concatenation of the arguments.

concat_ws(sep[, expr1[, …]])

Returns the concatenation of strings separated by sep.

Miscellaneous functions

The following functions are also supported:

Function

Description

CASE expr { WHEN opt1 THEN res1 } […] [ELSE def] END

Returns resN for the first optN that equals expr or def if none matches.

CASE { WHEN cond1 THEN res1 } […] [ELSE def] END

Returns resN for the first condN evaluating to true, or def if none found.

coalesce(expr1, expr2 [, …])

Returns the first non-null argument.

nvl(expr1, expr2)

Returns expr2 if expr1 is NULL, or expr1 otherwise.

Examples

The following examples demonstrate common uses for custom calculations. Each custom calculation appears in the dataset's schema on the data tab. On the canvas, you can choose the custom calculation as a field.

Conditionally filter and aggregate data

Use a CASE statement to conditionally aggregate data. The following example uses the samples.nyctaxi.trips dataset and calculates the sum of fares for all rides that start in the 10103 zip code.

SQL
SUM(CASE
WHEN pickup_zip=10103 THEN fare_amount
WHEN pickup_zip!=10103 THEN 0
END)

Construct strings

Use the CONCAT function to construct a new string value. See concat function and concat_ws function.

SQL
CONCAT(first_name, ' ', last_name)

Format dates

Use DATE_FORMAT to format date strings that appear in visualizations.

SQL
DATE_FORMAT(tpep_pickup_datetime, 'YYYY-MM-dd')