Skip to main content

theta_sketch_agg

Aggregate function: returns the compact binary representation of the Datasketches Theta Sketch with the values in the input column configured with lgNomEntries nominal entries.

Syntax

Python
from pyspark.databricks.sql import functions as dbf

dbf.theta_sketch_agg(col=<col>, lgNomEntries=<lgNomEntries>)

Parameters

Parameter

Type

Description

col

pyspark.sql.Column or column name

The column containing values to aggregate.

lgNomEntries

pyspark.sql.Column or int, optional

The log-base-2 of nominal entries, where nominal entries is the size of the sketch (must be between 4 and 26, defaults to 12).

Returns

pyspark.sql.Column: The binary representation of the Theta Sketch.

Examples

Python
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([1,2,2,3], "INT")
df.agg(dbf.theta_sketch_estimate(dbf.theta_sketch_agg("value"))).show()
Output
+--------------------------------------------------+
|theta_sketch_estimate(theta_sketch_agg(value, 12))|
+--------------------------------------------------+
| 3|
+--------------------------------------------------+
Python
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([1,2,2,3], "INT")
df.agg(dbf.theta_sketch_estimate(dbf.theta_sketch_agg("value", 15))).show()
Output
+--------------------------------------------------+
|theta_sketch_estimate(theta_sketch_agg(value, 15))|
+--------------------------------------------------+
| 3|
+--------------------------------------------------+