Skip to main content

approx_top_k_combine aggregate function

Applies to: check marked yes Databricks Runtime 18.1 and above

Combines multiple sketch states produced by approx_top_k_accumulate into a single merged sketch. Use this function to combine sketches from different partitions, time periods, or data sources.

Syntax

approx_top_k_combine ( state [, maxItemsTracked ] )

Arguments

  • state: A sketch state STRUCT (such as from approx_top_k_accumulate).
  • maxItemsTracked: An optional positive INTEGER literal specifying the maximum number of items to track in the combined sketch. Must be between 1 and 1,000,000, inclusive. When specified, this value is used for the output sketch. When omitted, all input sketches must have the same maxItemsTracked value, and that value is used for the output.

Returns

A STRUCT value containing the merged sketch state with the same schema as the output of approx_top_k_accumulate.

Notes

  • When maxItemsTracked is not specified, all input sketches must have the same maxItemsTracked value; otherwise an error is thrown.
  • When maxItemsTracked is specified, sketches with different maxItemsTracked values can be combined.
  • All input sketches must have the same item data type; otherwise an error is thrown.
  • NULL values are preserved and combined across sketches.

Error messages

Examples

SQL
-- Combine sketches from different data sources with explicit maxItemsTracked
> SELECT approx_top_k_estimate(approx_top_k_combine(sketch, 10000), 5) FROM (
SELECT approx_top_k_accumulate(expr) AS sketch FROM VALUES (0), (0), (1), (1) AS tab(expr)
UNION ALL
SELECT approx_top_k_accumulate(expr) AS sketch FROM VALUES (2), (3), (4), (4) AS tab(expr)
);
[{"item":0,"count":2},{"item":4,"count":2},{"item":1,"count":2},{"item":2,"count":1},{"item":3,"count":1}]

-- Combine sketches with same maxItemsTracked (no explicit size needed)
> SELECT approx_top_k_estimate(approx_top_k_combine(sketch), 3) FROM (
SELECT approx_top_k_accumulate(expr, 100) AS sketch FROM VALUES ('a'), ('a'), ('b') AS tab(expr)
UNION ALL
SELECT approx_top_k_accumulate(expr, 100) AS sketch FROM VALUES ('b'), ('c'), ('c'), ('c') AS tab(expr)
);
[{"item":"c","count":3},{"item":"a","count":2},{"item":"b","count":2}]