theta_sketch_agg
Aggregate function: returns the compact binary representation of the Datasketches Theta Sketch with the values in the input column configured with lgNomEntries nominal entries.
Syntax
Python
from pyspark.databricks.sql import functions as dbf
dbf.theta_sketch_agg(col=<col>, lgNomEntries=<lgNomEntries>)
Parameters
Parameter | Type | Description |
|---|---|---|
|
| The column containing values to aggregate. |
|
| The log-base-2 of nominal entries, where nominal entries is the size of the sketch (must be between 4 and 26, defaults to 12). |
Returns
pyspark.sql.Column: The binary representation of the Theta Sketch.
Examples
Python
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([1,2,2,3], "INT")
df.agg(dbf.theta_sketch_estimate(dbf.theta_sketch_agg("value"))).show()
Output
+--------------------------------------------------+
|theta_sketch_estimate(theta_sketch_agg(value, 12))|
+--------------------------------------------------+
| 3|
+--------------------------------------------------+
Python
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([1,2,2,3], "INT")
df.agg(dbf.theta_sketch_estimate(dbf.theta_sketch_agg("value", 15))).show()
Output
+--------------------------------------------------+
|theta_sketch_estimate(theta_sketch_agg(value, 15))|
+--------------------------------------------------+
| 3|
+--------------------------------------------------+