Skip to main content

theta_sketch_agg

Aggregate function: returns the compact binary representation of the Datasketches ThetaSketch with the values in the input column configured with lgNomEntries nominal entries.

Syntax

Python
from pyspark.databricks.sql import functions as dbf

dbf.theta_sketch_agg(col=<col>, lgNomEntries=<lgNomEntries>)

Parameters

Parameter

Type

Description

col

pyspark.sql.Column or column name

The column containing values to aggregate.

lgNomEntries

pyspark.sql.Column or int, optional

The log-base-2 of nominal entries, where nominal entries is the size of the sketch (must be between 4 and 26, defaults to 12).

Returns

pyspark.sql.Column: The binary representation of the ThetaSketch.

Examples

Python
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([1,2,2,3], "INT")
df.agg(dbf.theta_sketch_estimate(dbf.theta_sketch_agg("value"))).show()
Output
+--------------------------------------------------+
|theta_sketch_estimate(theta_sketch_agg(value, 12))|
+--------------------------------------------------+
| 3|
+--------------------------------------------------+
Python
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([1,2,2,3], "INT")
df.agg(dbf.theta_sketch_estimate(dbf.theta_sketch_agg("value", 15))).show()
Output
+--------------------------------------------------+
|theta_sketch_estimate(theta_sketch_agg(value, 15))|
+--------------------------------------------------+
| 3|
+--------------------------------------------------+