hll_union_agg
function
Applies to: Databricks SQL Databricks Runtime 13.3 LTS and above
This function utilizes the HyperLogLog algorithm to combine a group of sketches into a single one.
Queries can use the resulting buffers to compute approximate unique counts with the hll_sketch_estimate function.
The implementation uses the Apache Datasketches library. Please see HLL for more information.
Syntax
hll_union_agg ( expr [, allowDifferentLgConfigK ] )
This function can also be invoked as a window function using the OVER
clause.
Arguments
expr
: ABINARY
expression holding a sketch generated by hll_sketch_agg.allowDifferentLgConfigK
: A optionalBOOLEAN
constant expression controlling whether to allow merging sketches with different lgConfigK values. The default value is false.
Returns
A BINARY
buffer containing the HyperLogLog sketch computed as a result of combining the input expressions of the same group.
When the allowDifferentLgConfigK
parameter is true, the result sketch uses the smaller of the two provided lgConfigK
values.
Examples
> SELECT hll_sketch_estimate(hll_union_agg(sketch, true))
FROM (SELECT hll_sketch_agg(col) as sketch
FROM VALUES (1) AS tab(col)
UNION ALL
SELECT hll_sketch_agg(col, 20) as sketch
FROM VALUES (1) AS tab(col));
1
> SELECT hll_sketch_estimate(hll_union_agg(sketch, false))
FROM (SELECT hll_sketch_agg(col) as sketch
FROM VALUES (1) AS tab(col)
UNION ALL
SELECT hll_sketch_agg(col, 20) as sketch
FROM VALUES (1) AS tab(col));
error