Skip to main content

kll_merge_agg_float

Aggregate function: merges binary KllFloatsSketch representations and returns the merged sketch. The optional k parameter controls the size and accuracy of the merged sketch (range 8-65535). If k is not specified, the merged sketch adopts the k value from the first input sketch.

Syntax

Python
from pyspark.databricks.sql import functions as dbf

dbf.kll_merge_agg_float(col=<col>, k=<k>)

Parameters

Parameter

Type

Description

col

pyspark.sql.Column or column name

The column containing binary KllFloatsSketch representations.

k

pyspark.sql.Column or int, optional

The k parameter that controls size and accuracy (range 8-65535).

Returns

pyspark.sql.Column: The merged binary representation of the KllFloatsSketch.

Examples

Python
from pyspark.databricks.sql import functions as dbf
df1 = spark.createDataFrame([1.0,2.0,3.0], "FLOAT")
df2 = spark.createDataFrame([4.0,5.0,6.0], "FLOAT")
sketch1 = df1.agg(dbf.kll_sketch_agg_float("value").alias("sketch"))
sketch2 = df2.agg(dbf.kll_sketch_agg_float("value").alias("sketch"))
merged = sketch1.union(sketch2).agg(dbf.kll_merge_agg_float("sketch").alias("merged"))
n = merged.select(dbf.kll_sketch_get_n_float("merged")).first()[0]
n
Output
6