sumDistinctΒΆ

sumDistinct is an aggregate function that will return the sum of all distinct values in the selected column.

Note:

sumDistinct does not sum all values in the specified column for distinct rows in your DataFrame.

Syntax:

  • sumDistinct(df$col)

Parameters:

  • df: Any SparkR DataFrame
  • col: Column in DataFrame

Output:

  • Column Object
# Create SparkR DataFrame
df <- createDataFrame(data.frame(customer = c("James", "Peter", "Jane", "James"),
                                             amount = c(5, 5, 6, 5)))
head(df)
# sumDistinct will sum the distinct values in df$amount
head(select(df, sumDistinct(df$amount)))
# To exclude duplicate rows in your sum, use a combination of sum() and distinct()
# Dupe record for James is excluded in the sum
head(select(distinct(df), sum(df$amount)))