theta_sketch_estimate
Returns the estimated number of unique values given the binary representation of a Datasketches ThetaSketch.
Syntax
Python
from pyspark.sql import functions as sf
sf.theta_sketch_estimate(col)
Parameters
Parameter | Type | Description |
|---|---|---|
|
| The Theta sketch binary representation. |
Returns
pyspark.sql.Column: The estimated number of unique values for the ThetaSketch.
Examples
Example 1: Estimate unique values from Theta sketch
Python
from pyspark.sql import functions as sf
df = spark.createDataFrame([1,2,2,3], "INT")
df.agg(sf.theta_sketch_estimate(sf.theta_sketch_agg("value"))).show()
Output
+--------------------------------------------------+
|theta_sketch_estimate(theta_sketch_agg(value, 12))|
+--------------------------------------------------+
| 3|
+--------------------------------------------------+