Skip to main content

hll_sketch_agg

Aggregate function: returns the updatable binary representation of the Datasketches HllSketch configured with lgConfigK arg.

Syntax

Python
from pyspark.sql import functions as sf

sf.hll_sketch_agg(col, lgConfigK=None)

Parameters

Parameter

Type

Description

col

pyspark.sql.Column or str

The column to aggregate.

lgConfigK

pyspark.sql.Column or int, optional

The log-base-2 of K, where K is the number of buckets or slots for the HllSketch.

Returns

pyspark.sql.Column: The binary representation of the HllSketch.

Examples

Example 1: Create HLL sketch with default lgConfigK

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame([1,2,2,3], "INT")
df.agg(sf.hll_sketch_estimate(sf.hll_sketch_agg("value"))).show()
Output
+----------------------------------------------+
|hll_sketch_estimate(hll_sketch_agg(value, 12))|
+----------------------------------------------+
| 3|
+----------------------------------------------+

Example 2: Create HLL sketch with specified lgConfigK

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame([1,2,2,3], "INT")
df.agg(sf.hll_sketch_estimate(sf.hll_sketch_agg("value", 12))).show()
Output
+----------------------------------------------+
|hll_sketch_estimate(hll_sketch_agg(value, 12))|
+----------------------------------------------+
| 3|
+----------------------------------------------+