count_min_sketch
aggregate function
Applies to: Databricks SQL Databricks Runtime
Returns a count-min sketch of all values in the group in column
with the epsilon
, confidence
and seed
.
In Databricks SQL and Databricks Runtime 13.3 LTS and above this function supports named parameter invocation.
Syntax
count_min_sketch ( [ALL | DISTINCT] column, epsilon, confidence, seed ) [FILTER ( WHERE cond ) ]
This function can also be invoked as a window function using the OVER
clause.
Arguments
column
: An expression that evaluates to an integral numeric,STRING
, orBINARY
.epsilon
: ADOUBLE
literal greater than 0 describing the relative error.confidence
: ADOUBLE
literal greater than 0 and less than 1.seed
: AnINTEGER
literal.cond
: An optional boolean expression filtering the rows used for aggregation.
Returns
A BINARY
.
Count-min sketch is a probabilistic data structure used for cardinality estimation using sub-linear space.
If DISTINCT
is specified the function operates only on a unique set of expr
values.
Examples
-- Named parameter invocation
> SELECT hex(count_min_sketch(column => col, confidence => 0.5d, epsilon => 0.5d, seed => 1)) FROM VALUES (1), (2), (1) AS tab(col);
0000000100000000000000030000000100000004000000005D8D6AB90000000000000000000000000000000200000000000000010000000000000000
> SELECT hex(count_min_sketch(DISTINCT col, 0.5d, 0.5d, 1)) FROM VALUES (1), (2), (1) AS tab(col);
0000000100000000000000020000000100000004000000005D8D6AB90000000000000000000000000000000100000000000000010000000000000000