countDistinct

countDistinct returns the number of distinct items there are in a DataFrame column.

Syntax:

  • countDistinct(DataFrame$colName)

Parameters:

  • DataFrame: Any SparkR DataFrame
  • colName: Column in DataFrame

Output:

  • Column Object
# Create SparkR DataFrame
df <- createDataFrame(sqlContext, airquality)
head(df)
# Use select() to view the column returned by countDistinct()
head(select(df, countDistinct(df$Ozone)))

Note: countDistinct will not count NA values.