theta_intersection_agg
Aggregate function: returns the compact binary representation of the Datasketches Theta Sketch that is the intersection of the Theta sketches in the input column.
Syntax
Python
from pyspark.databricks.sql import functions as dbf
dbf.theta_intersection_agg(col=<col>)
Parameters
Parameter | Type | Description |
|---|---|---|
|
| The column containing Theta sketches to intersect. |
Returns
pyspark.sql.Column: The binary representation of the intersected Theta Sketch.
Examples
Python
from pyspark.databricks.sql import functions as dbf
df1 = spark.createDataFrame([1,2,2,3], "INT")
df1 = df1.agg(dbf.theta_sketch_agg("value").alias("sketch"))
df2 = spark.createDataFrame([2,3,3,4], "INT")
df2 = df2.agg(dbf.theta_sketch_agg("value").alias("sketch"))
df3 = df1.union(df2)
df3.agg(dbf.theta_sketch_estimate(dbf.theta_intersection_agg("sketch"))).show()
Output
+-----------------------------------------------------+
|theta_sketch_estimate(theta_intersection_agg(sketch))|
+-----------------------------------------------------+
| 2|
+-----------------------------------------------------+