Skip to main content

hll_union

Merges two binary representations of Datasketches HllSketch objects, using a Datasketches Union object. Throws an exception if sketches have different lgConfigK values and allowDifferentLgConfigK is unset or set to false.

Syntax

Python
from pyspark.sql import functions as sf

sf.hll_union(col1, col2, allowDifferentLgConfigK=None)

Parameters

Parameter

Type

Description

col1

pyspark.sql.Column or str

The first HLL sketch.

col2

pyspark.sql.Column or str

The second HLL sketch.

allowDifferentLgConfigK

bool, optional

Allow sketches with different lgConfigK values to be merged (defaults to false).

Returns

pyspark.sql.Column: The binary representation of the merged HllSketch.

Examples

Example 1: Union two HLL sketches

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame([(1,4),(2,5),(2,5),(3,6)], "struct<v1:int,v2:int>")
df = df.agg(
sf.hll_sketch_agg("v1").alias("sketch1"),
sf.hll_sketch_agg("v2").alias("sketch2")
)
df.select(sf.hll_sketch_estimate(sf.hll_union(df.sketch1, "sketch2"))).show()
Output
+-------------------------------------------------------+
|hll_sketch_estimate(hll_union(sketch1, sketch2, false))|
+-------------------------------------------------------+
| 6|
+-------------------------------------------------------+