メインコンテンツまでスキップ

theta_union_agg aggregate function

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime 18.0 and above

Consumes multiple Theta Sketch buffers and merges them using set union into one result buffer. Use this function to combine sketches from different partitions or time periods.

Syntax

theta_union_agg ( sketch [, lgNomEntries ] )

Arguments

  • sketch: A Theta Sketch in binary format (such as from theta_sketch_agg aggregate function).
  • lgNomEntries: An optional INTEGER literal specifying the log-base-2 of the nominal entries for the union buffer. Must be between 4 and 26, inclusive. The default is 12. Higher values provide better accuracy but use more memory.

Returns

A BINARY value containing the merged serialized Theta Sketch representing the union of all input sketches.

Notes

  • The union operation handles input sketches with different lgNomEntries values.
  • NULL values are ignored during aggregation.
  • To merge exactly two sketches, use the scalar theta_union function function instead.

Error messages

Examples

SQL
-- Merge sketches from different groups
> SELECT theta_sketch_estimate(theta_union_agg(sketch)) FROM (
SELECT theta_sketch_agg(col) AS sketch FROM VALUES (1), (2), (3) AS tab(col)
UNION ALL
SELECT theta_sketch_agg(col) AS sketch FROM VALUES (3), (4), (5) AS tab(col)
) t;
5

-- Merge sketches with custom lgNomEntries
> SELECT theta_sketch_estimate(theta_union_agg(sketch, 15)) FROM (
SELECT theta_sketch_agg(col) AS sketch FROM VALUES (1), (2) AS tab(col)
UNION ALL
SELECT theta_sketch_agg(col, 20) AS sketch FROM VALUES (2), (3) AS tab(col)
) t;
3