Pular para o conteúdo principal

approx_top_k_estimate function

Applies to: check marked yes Databricks Runtime 18.1 and above

Returns the top K most frequent items with their estimated counts from a sketch state produced by approx_top_k_accumulate or approx_top_k_combine.

Syntax

approx_top_k_estimate ( state [, k ] )

Arguments

  • state: A sketch state STRUCT (such as from approx_top_k_accumulate or approx_top_k_combine).
  • k: An optional positive INTEGER literal specifying the number of top items to return. Must be greater than 0 and less than or equal to maxItemsTracked of the input sketch. The default is 5.

Returns

An ARRAY of STRUCT values, where each struct contains:

  • item: The frequent item value (same type as the original input expression)
  • count: LONG — the estimated frequency count

The array is sorted by count in descending order.

Notes

  • The k parameter must be a constant value.
  • The k value must not exceed the maxItemsTracked value of the input sketch.
  • NULL values are included in results when they are among the top K most frequent.
  • The frequency counts are estimates; accuracy depends on the maxItemsTracked parameter used when creating the sketch.

Error messages

Examples

SQL
-- Get default top-5 items
> SELECT approx_top_k_estimate(approx_top_k_accumulate(expr)) FROM VALUES (0), (0), (1), (1), (2), (3), (4), (4) AS tab(expr);
[{"item":0,"count":2},{"item":4,"count":2},{"item":1,"count":2},{"item":2,"count":1},{"item":3,"count":1}]

-- Get top-2 items
> SELECT approx_top_k_estimate(approx_top_k_accumulate(expr), 2) FROM VALUES ('a'), ('b'), ('c'), ('c'), ('c'), ('c'), ('d'), ('d') AS tab(expr);
[{"item":"c","count":4},{"item":"d","count":2}]

-- Results include NULL when it is among top K
> SELECT approx_top_k_estimate(approx_top_k_accumulate(expr), 3) FROM VALUES (0), (0), (1), (1), (1), (NULL) AS tab(expr);
[{"item":1,"count":3},{"item":0,"count":2},{"item":null,"count":1}]