approx_top_k_estimate function
Applies to: Databricks Runtime 18.1 and above
Returns the top K most frequent items with their estimated counts from a sketch state produced by approx_top_k_accumulate or approx_top_k_combine.
Syntax
approx_top_k_estimate ( state [, k ] )
Arguments
- state: A sketch state
STRUCT(such as from approx_top_k_accumulate or approx_top_k_combine). - k: An optional positive
INTEGERliteral specifying the number of top items to return. Must be greater than 0 and less than or equal tomaxItemsTrackedof the input sketch. The default is 5.
Returns
An ARRAY of STRUCT values, where each struct contains:
- item: The frequent item value (same type as the original input expression)
- count:
LONG— the estimated frequency count
The array is sorted by count in descending order.
Notes
- The
kparameter must be a constant value. - The
kvalue must not exceed themaxItemsTrackedvalue of the input sketch. NULLvalues are included in results when they are among the top K most frequent.- The frequency counts are estimates; accuracy depends on the
maxItemsTrackedparameter used when creating the sketch.
Error messages
Examples
SQL
-- Get default top-5 items
> SELECT approx_top_k_estimate(approx_top_k_accumulate(expr)) FROM VALUES (0), (0), (1), (1), (2), (3), (4), (4) AS tab(expr);
[{"item":0,"count":2},{"item":4,"count":2},{"item":1,"count":2},{"item":2,"count":1},{"item":3,"count":1}]
-- Get top-2 items
> SELECT approx_top_k_estimate(approx_top_k_accumulate(expr), 2) FROM VALUES ('a'), ('b'), ('c'), ('c'), ('c'), ('c'), ('d'), ('d') AS tab(expr);
[{"item":"c","count":4},{"item":"d","count":2}]
-- Results include NULL when it is among top K
> SELECT approx_top_k_estimate(approx_top_k_accumulate(expr), 3) FROM VALUES (0), (0), (1), (1), (1), (NULL) AS tab(expr);
[{"item":1,"count":3},{"item":0,"count":2},{"item":null,"count":1}]