Skip to main content

hll_sketch_estimate

Returns the estimated number of unique values given the binary representation of a Datasketches HllSketch.

Syntax

Python
from pyspark.sql import functions as sf

sf.hll_sketch_estimate(col)

Parameters

Parameter

Type

Description

col

pyspark.sql.Column or str

The HLL sketch binary representation.

Returns

pyspark.sql.Column: The estimated number of unique values for the HllSketch.

Examples

Example 1: Estimate unique values from HLL sketch

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame([1,2,2,3], "INT")
df.agg(sf.hll_sketch_estimate(sf.hll_sketch_agg("value"))).show()
Output
+----------------------------------------------+
|hll_sketch_estimate(hll_sketch_agg(value, 12))|
+----------------------------------------------+
| 3|
+----------------------------------------------+