Skip to main content

kll_sketch_agg_bigint

Aggregate function: returns the compact binary representation of the Datasketches KllLongsSketch built with the values in the input column. The optional k parameter controls the size and accuracy of the sketch (default 200, range 8-65535).

Syntax

Python
from pyspark.databricks.sql import functions as dbf

dbf.kll_sketch_agg_bigint(col=<col>, k=<k>)

Parameters

Parameter

Type

Description

col

pyspark.sql.Column or column name

The column containing bigint values to aggregate.

k

pyspark.sql.Column or int, optional

The k parameter that controls size and accuracy (default 200, range 8-65535).

Returns

pyspark.sql.Column: The binary representation of the KllLongsSketch.

Examples

Python
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([1,2,3,4,5], "INT")
result = df.agg(dbf.kll_sketch_agg_bigint("value")).first()[0]
result is not None and len(result) > 0
Output
True