kll_sketch_agg_bigint aggregate function
Applies to: Databricks Runtime 18.0 and later
Creates a KLL (K-Linear-Logarithmic) sketch for approximate quantile estimation on integer data with configurable accuracy.
Syntax
kll_sketch_agg_bigint ( expr [, k] )
Arguments
expr: An integral numeric expression to aggregate.k: An optionalINTEGERliteral controlling sketch accuracy. Must be between 8 and 65535. The default is 200. Higher values provide better accuracy but use more memory.
Returns
A BINARY value containing the serialized KLL sketch for integer data.
Notes
NULLvalues in expr are ignored during aggregation.- The sketch provides approximate quantiles with a confidence level of about 99%.
- Sketches are mergeable, allowing distributed aggregation.
- Memory usage is approximately O(k) items regardless of input size.
Examples
SQL
-- Create sketch with default k=200
> SELECT kll_sketch_agg_bigint(value) FROM VALUES (1), (2), (3), (4), (5) AS T(value)
[binary data]
-- Create sketch with custom k=400 for higher accuracy
> SELECT kll_sketch_agg_bigint(value, 400) FROM VALUES (10), (20), (30) AS T(value)
[binary data]