hll_union function
Applies to:  Databricks SQL 
 Databricks Runtime 13.3 LTS and above
This function utilizes the HyperLogLog algorithm to combine two sketches into a single sketch.
Queries can use the resulting buffers to compute approximate unique counts as long integers with the hll_sketch_estimate function.
The implementation uses the Apache Datasketches library. Please see HLL for more information.
Syntax
hll_union ( expr1, expr2 [, allowDifferentLgConfigK ] )
Arguments
- exprN: A- BINARYexpression holding a sketch generated by hll_sketch_agg.
- allowDifferentLgConfigK: A optional- BOOLEANexpression controlling whether to allow merging two sketches with different lgConfigK values. The default value is false.
Returns
A BINARY buffer containing the HyperLogLog sketch computed as a result of combining the input expressions.
When the allowDifferentLgConfigK parameter is true, the result sketch uses the smaller of the two provided lgConfigK values.
Examples
SQL
> SELECT hll_sketch_estimate(
  hll_union(
    hll_sketch_agg(col1),
    hll_sketch_agg(col2)))
  FROM VALUES
    (1, 4),
    (1, 4),
    (2, 5),
    (2, 5),
    (3, 6) AS tab(col1, col2);
  6
> SELECT hll_sketch_estimate(
  hll_union(
    hll_sketch_agg(col1,  4),
    hll_sketch_agg(col2, 21)))
  FROM VALUES
    (1, 4),
    (1, 4),
    (2, 5),
    (2, 5),
    (3, 6) AS tab(col1, col2);
  error