countDistinctΒΆ

countDistinct returns the number of distinct items there are in a SparkDataFrame column.

Syntax:

  • countDistinct(SparkDataFrame$colName)

Parameters:

  • SparkDataFrame: Any SparkDataFrame
  • colName: Column in SparkDataFrame

Output:

  • Column Object
require(SparkR)

# Create SparkDataFrame
df <- createDataFrame(airquality)
head(df)
# Use select() to view the column returned by countDistinct()
head(select(df, countDistinct(df$Ozone)))

Note: countDistinct will not count NA values.