Skip to main content

hll_union_agg

Aggregate function: returns the updatable binary representation of the Datasketches HllSketch, generated by merging previously created Datasketches HllSketch instances via a Datasketches Union instance. Throws an exception if sketches have different lgConfigK values and allowDifferentLgConfigK is unset or set to false.

Syntax

Python
from pyspark.sql import functions as sf

sf.hll_union_agg(col, allowDifferentLgConfigK=None)

Parameters

Parameter

Type

Description

col

pyspark.sql.Column or str

The column containing HLL sketches to merge.

allowDifferentLgConfigK

pyspark.sql.Column or bool, optional

Allow sketches with different lgConfigK values to be merged (defaults to false).

Returns

pyspark.sql.Column: The binary representation of the merged HllSketch.

Examples

Example 1: Merge HLL sketches with default settings

Python
from pyspark.sql import functions as sf
df1 = spark.createDataFrame([1,2,2,3], "INT")
df1 = df1.agg(sf.hll_sketch_agg("value").alias("sketch"))
df2 = spark.createDataFrame([4,5,5,6], "INT")
df2 = df2.agg(sf.hll_sketch_agg("value").alias("sketch"))
df3 = df1.union(df2)
df3.agg(sf.hll_sketch_estimate(sf.hll_union_agg("sketch"))).show()
Output
+-------------------------------------------------+
|hll_sketch_estimate(hll_union_agg(sketch, false))|
+-------------------------------------------------+
| 6|
+-------------------------------------------------+

Example 2: Merge HLL sketches with explicit allowDifferentLgConfigK

Python
from pyspark.sql import functions as sf
df1 = spark.createDataFrame([1,2,2,3], "INT")
df1 = df1.agg(sf.hll_sketch_agg("value").alias("sketch"))
df2 = spark.createDataFrame([4,5,5,6], "INT")
df2 = df2.agg(sf.hll_sketch_agg("value").alias("sketch"))
df3 = df1.union(df2)
df3.agg(sf.hll_sketch_estimate(sf.hll_union_agg("sketch", False))).show()
Output
+-------------------------------------------------+
|hll_sketch_estimate(hll_union_agg(sketch, false))|
+-------------------------------------------------+
| 6|
+-------------------------------------------------+