Skip to main content

approxCountDistinct

This aggregate function returns a new Column, which estimates the approximate distinct count of elements in a specified column or a group of columns. Supports Spark Connect.

warning

Deprecated in 2.1.0. Use approx_count_distinct instead.

Syntax

Python
from pyspark.databricks.sql import functions as dbf

dbf.approxCountDistinct(col=<col>, rsd=<rsd>)

Parameters

Parameter

Type

Description

col

pyspark.sql.Column or column name

The label of the column to count distinct values in.

rsd

float, optional

The maximum allowed relative standard deviation (default = 0.05).

Examples

See approx_count_distinct for examples.