Applies to: Databricks SQL Databricks Runtime
Returns the estimated number of distinct values in
expr within the group.
The implementation uses the dense version of the HyperLogLog++ (HLL++) algorithm, a state of the art cardinality estimation algorithm.
Results are accurate within a default value of 5%, which derives from the value
of the maximum relative standard deviation, although this is configurable with
relativeSD parameter as mentioned below.
approx_count_distinct(expr[, relativeSD]) [FILTER ( WHERE cond ) ]
This function can also be invoked as a window function using the
expr: Can be of any type for which equivalence is defined.
relativeSD: Defines the maximum relative standard deviation allowed.
cond: An optional boolean expression filtering the rows used for aggregation.