theta_union function
Applies to: Databricks SQL
Databricks Runtime 18.0 and above
Merges exactly two Theta Sketch binary representations using set union.
Syntax
theta_union ( first, second [, lgNomEntries ] )
Arguments
- first: A Theta Sketch in binary format.
- second: A Theta Sketch in binary format.
- lgNomEntries: An optional
INTEGERliteral specifying the log-base-2 of the nominal entries for the union buffer. Must be between 4 and 26, inclusive. The default is 12.
Returns
A BINARY value containing the serialized Theta Sketch representing the union of the two input sketches.
Notes
- The union operation handles input sketches with different
lgNomEntriesvalues. - To merge more than two sketches, use the aggregate
theta_union_aggaggregate function function instead.
Error messages
Examples
SQL
-- Union two sketches
> SELECT theta_sketch_estimate(theta_union(theta_sketch_agg(col1), theta_sketch_agg(col2)))
FROM VALUES (1, 4), (1, 4), (2, 5), (2, 5), (3, 6) tab(col1, col2);
6