count_min_sketch aggregate function

Returns a count-min sketch of all values in the group in expr with the epsilon, confidence and seed.

Syntax

count_min_sketch ( [ALL | DISTINCT] expr, epsilon, confidence, seed ) [FILTER ( WHERE cond ) ]

Arguments

  • expr: An expression that evaluates to an integral numeric, STRING, or BINARY.

  • epsilon: A DOUBLE literal greater than 0 describing the relative error.

  • confidence: A DOUBLE literal greater than 0 and less than 1.

  • seed: An INTEGER literal.

  • cond: An optional boolean expression filtering the rows used for aggregation.

Returns

A BINARY.

Count-min sketch is a probabilistic data structure used for cardinality estimation using sub-linear space.

If DISTINCT is specified the function operates only on a unique set of expr values.

Examples

> SELECT hex(count_min_sketch(col, 0.5d, 0.5d, 1)) FROM VALUES (1), (2), (1) AS tab(col);
0000000100000000000000030000000100000004000000005D8D6AB90000000000000000000000000000000200000000000000010000000000000000
> SELECT hex(count_min_sketch(DISTINCT col, 0.5d, 0.5d, 1)) FROM VALUES (1), (2), (1) AS tab(col);
0000000100000000000000020000000100000004000000005D8D6AB90000000000000000000000000000000100000000000000010000000000000000