`approx_top_k_estimate` function

Applies to: Databricks Runtime 18.1 and above

Returns the top K most frequent items with their estimated counts from a sketch state produced by approx_top_k_accumulate or approx_top_k_combine.

Syntax

approx_top_k_estimate ( state [, k ] )

Arguments

state: A sketch state STRUCT (such as from approx_top_k_accumulate or approx_top_k_combine).
k: An optional positive INTEGER literal specifying the number of top items to return. Must be greater than 0 and less than or equal to maxItemsTracked of the input sketch. The default is 5.

Returns

An ARRAY of STRUCT values, where each struct contains:

item: The frequent item value (same type as the original input expression)
count: LONG — the estimated frequency count

The array is sorted by count in descending order.

Notes

The k parameter must be a constant value.
The k value must not exceed the maxItemsTracked value of the input sketch.
NULL values are included in results when they are among the top K most frequent.
The frequency counts are estimates; accuracy depends on the maxItemsTracked parameter used when creating the sketch.

Error messages

Examples

SQL
-- Get default top-5 items
> SELECT approx_top_k_estimate(approx_top_k_accumulate(expr)) FROM VALUES (0), (0), (1), (1), (2), (3), (4), (4) AS tab(expr);
[{"item":0,"count":2},{"item":4,"count":2},{"item":1,"count":2},{"item":2,"count":1},{"item":3,"count":1}]

-- Get top-2 items
> SELECT approx_top_k_estimate(approx_top_k_accumulate(expr), 2) FROM VALUES ('a'), ('b'), ('c'), ('c'), ('c'), ('c'), ('d'), ('d') AS tab(expr);
[{"item":"c","count":4},{"item":"d","count":2}]

-- Results include NULL when it is among top K
> SELECT approx_top_k_estimate(approx_top_k_accumulate(expr), 3) FROM VALUES (0), (0), (1), (1), (1), (NULL) AS tab(expr);
[{"item":1,"count":3},{"item":0,"count":2},{"item":null,"count":1}]

Syntax​

Arguments​

Returns​

Notes​

Error messages​

Examples​

Related​