Skip to main content

theta_intersection_agg

Aggregate function: returns the compact binary representation of the Datasketches Theta Sketch that is the intersection of the Theta sketches in the input column.

Syntax

Python
from pyspark.databricks.sql import functions as dbf

dbf.theta_intersection_agg(col=<col>)

Parameters

Parameter

Type

Description

col

pyspark.sql.Column or column name

The column containing Theta sketches to intersect.

Returns

pyspark.sql.Column: The binary representation of the intersected Theta Sketch.

Examples

Python
from pyspark.databricks.sql import functions as dbf
df1 = spark.createDataFrame([1,2,2,3], "INT")
df1 = df1.agg(dbf.theta_sketch_agg("value").alias("sketch"))
df2 = spark.createDataFrame([2,3,3,4], "INT")
df2 = df2.agg(dbf.theta_sketch_agg("value").alias("sketch"))
df3 = df1.union(df2)
df3.agg(dbf.theta_sketch_estimate(dbf.theta_intersection_agg("sketch"))).show()
Output
+-----------------------------------------------------+
|theta_sketch_estimate(theta_intersection_agg(sketch))|
+-----------------------------------------------------+
| 2|
+-----------------------------------------------------+