theta_union_agg
Aggregate function: returns the compact binary representation of the Datasketches Theta Sketch that is the union of the Theta sketches in the input column.
Syntax
Python
from pyspark.databricks.sql import functions as dbf
dbf.theta_union_agg(col=<col>, lgNomEntries=<lgNomEntries>)
Parameters
Parameter | Type | Description |
|---|---|---|
|
| The column containing Theta sketches to union. |
|
| The log-base-2 of nominal entries for the union operation (must be between 4 and 26, defaults to 12). |
Returns
pyspark.sql.Column: The binary representation of the merged Theta Sketch.
Examples
Python
from pyspark.databricks.sql import functions as dbf
df1 = spark.createDataFrame([1,2,2,3], "INT")
df1 = df1.agg(dbf.theta_sketch_agg("value").alias("sketch"))
df2 = spark.createDataFrame([4,5,5,6], "INT")
df2 = df2.agg(dbf.theta_sketch_agg("value").alias("sketch"))
df3 = df1.union(df2)
df3.agg(dbf.theta_sketch_estimate(dbf.theta_union_agg("sketch"))).show()
Output
+--------------------------------------------------+
|theta_sketch_estimate(theta_union_agg(sketch, 12))|
+--------------------------------------------------+
| 6|
+--------------------------------------------------+