approx_top_k_combine aggregate function
Applies to: Databricks Runtime 18.1 and above
Combines multiple sketch states produced by approx_top_k_accumulate into a single merged sketch. Use this function to combine sketches from different partitions, time periods, or data sources.
Syntax
approx_top_k_combine ( state [, maxItemsTracked ] )
Arguments
- state: A sketch state
STRUCT(such as from approx_top_k_accumulate). - maxItemsTracked: An optional positive
INTEGERliteral specifying the maximum number of items to track in the combined sketch. Must be between 1 and 1,000,000, inclusive. When specified, this value is used for the output sketch. When omitted, all input sketches must have the samemaxItemsTrackedvalue, and that value is used for the output.
Returns
A STRUCT value containing the merged sketch state with the same schema as the output of approx_top_k_accumulate.
Notes
- When
maxItemsTrackedis not specified, all input sketches must have the samemaxItemsTrackedvalue; otherwise an error is thrown. - When
maxItemsTrackedis specified, sketches with differentmaxItemsTrackedvalues can be combined. - All input sketches must have the same item data type; otherwise an error is thrown.
NULLvalues are preserved and combined across sketches.
Error messages
- APPROX_TOP_K_SKETCH_SIZE_NOT_MATCH
- APPROX_TOP_K_SKETCH_TYPE_NOT_MATCH
- APPROX_TOP_K_NON_POSITIVE_ARG
- APPROX_TOP_K_MAX_ITEMS_TRACKED_EXCEEDS_LIMIT
Examples
SQL
-- Combine sketches from different data sources with explicit maxItemsTracked
> SELECT approx_top_k_estimate(approx_top_k_combine(sketch, 10000), 5) FROM (
SELECT approx_top_k_accumulate(expr) AS sketch FROM VALUES (0), (0), (1), (1) AS tab(expr)
UNION ALL
SELECT approx_top_k_accumulate(expr) AS sketch FROM VALUES (2), (3), (4), (4) AS tab(expr)
);
[{"item":0,"count":2},{"item":4,"count":2},{"item":1,"count":2},{"item":2,"count":1},{"item":3,"count":1}]
-- Combine sketches with same maxItemsTracked (no explicit size needed)
> SELECT approx_top_k_estimate(approx_top_k_combine(sketch), 3) FROM (
SELECT approx_top_k_accumulate(expr, 100) AS sketch FROM VALUES ('a'), ('a'), ('b') AS tab(expr)
UNION ALL
SELECT approx_top_k_accumulate(expr, 100) AS sketch FROM VALUES ('b'), ('c'), ('c'), ('c') AS tab(expr)
);
[{"item":"c","count":3},{"item":"a","count":2},{"item":"b","count":2}]