hll_sketch_agg aggregate function
Applies to:  Databricks SQL 
 Databricks Runtime 13.3 LTS and above
This function utilizes the HyperLogLog algorithm to count a probabilistic approximation of the number of unique values in a given column, and outputs the result as a binary representation known as a sketch buffer. This binary representation is suitable for persistence.
Queries can use the resulting buffers to compute approximate unique counts with the hll_sketch_estimate function.
The hll_union and hll_union_agg functions can also combine sketches together by consuming and merging these buffers as inputs.
The implementation uses the Apache Datasketches library. Please see HLL for more information.
Syntax
hll_sketch_agg ( expr [, lgConfigK ] )
This function can also be invoked as a window function using the OVER clause.
Arguments
- expr: An expression of type- INT,- BIGINT,- STRING, or- BINARYagainst which unique counting will occur.
- lgConfigK: An optional- INTconstant between 4 and 21 inclusive with default 12. The log-base-2 of K, where K is the number of buckets or slots for the sketch.
Any NULL in expr is ignored.
Returns
A non-NULL BINARY buffer containing the HyperLogLog sketch computed as a result of consuming and aggregating all input values in the aggregation group.
Examples
> SELECT hll_sketch_estimate(hll_sketch_agg(col, 12))
    FROM VALUES (1), (1), (2), (2), (3) tab(col);
  3
> SELECT hll_sketch_estimate(hll_sketch_agg(col))
    FROM VALUES (1), (1), (2), (2), (3) tab(col);
  3