approx_count_distinct aggregate function (Databricks SQL)

Returns the estimated number of distinct values in expr within the group.

Syntax

approx_count_distinct(expr[, relativeSD]) [FILTER ( WHERE cond ) ]

Arguments

  • expr: Can be of any type for which equivalence is defined.

  • relativeSD: Defines the maximum relative standard deviation allowed.

  • cond: An optional boolean expression filtering the rows used for aggregation.

Returns

A BIGINT.

Examples

> SELECT approx_count_distinct(col1) FROM VALUES (1), (1), (2), (2), (3) tab(col1);
 3
> SELECT approx_count_distinct(col1) FILTER(WHERE col2 = 10)
    FROM VALUES (1, 10), (1, 10), (2, 10), (2, 10), (3, 10), (1, 12) AS tab(col1, col2);
 3