Skip to main content

countDistinct

Returns a new Column for distinct count of col or cols. Supports Spark Connect.

An alias of count_distinct, and it is encouraged to use count_distinct directly.

Syntax

Python
from pyspark.databricks.sql import functions as dbf

dbf.countDistinct(col=<col>, *cols)

Parameters

Parameter

Type

Description

col

pyspark.sql.Column or column name

First column to compute on.

cols

pyspark.sql.Column or column name

Other columns to compute on.

Examples

Python
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([(1,), (1,), (3,)], ["value"])
df.select(dbf.count_distinct(df.value)).show()
Output
+---------------------+
|count(DISTINCT value)|
+---------------------+
| 2|
+---------------------+
Python
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([(1,), (1,), (3,)], ["value"])
df.select(dbf.countDistinct(df.value)).show()
Output
+---------------------+
|count(DISTINCT value)|
+---------------------+
| 2|
+---------------------+