kll_sketch_agg_double
Aggregate function: returns the compact binary representation of the Datasketches KllDoublesSketch built with the values in the input column. The optional k parameter controls the size and accuracy of the sketch (default 200, range 8-65535).
Syntax
Python
from pyspark.databricks.sql import functions as dbf
dbf.kll_sketch_agg_double(col=<col>, k=<k>)
Parameters
Parameter | Type | Description |
|---|---|---|
|
| The column containing double values to aggregate. |
|
| The k parameter that controls size and accuracy (default 200, range 8-65535). |
Returns
pyspark.sql.Column: The binary representation of the KllDoublesSketch.
Examples
Python
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([1.0,2.0,3.0,4.0,5.0], "DOUBLE")
result = df.agg(dbf.kll_sketch_agg_double("value")).first()[0]
result is not None and len(result) > 0
Output
True