# sumDistinct¶

sumDistinct is an aggregate function that will return the sum of all distinct values in the selected column.

Note:

sumDistinct does not sum all values in the specified column for distinct rows in your DataFrame.

Syntax:

• sumDistinct(df$col) Parameters: • df: Any SparkR DataFrame • col: Column in DataFrame Output: • Column Object # Create SparkR DataFrame df <- createDataFrame(data.frame(customer = c("James", "Peter", "Jane", "James"), amount = c(5, 5, 6, 5))) head(df)  # sumDistinct will sum the distinct values in df$amount
head(select(df, sumDistinct(df$amount)))  # To exclude duplicate rows in your sum, use a combination of sum() and distinct() # Dupe record for James is excluded in the sum head(select(distinct(df), sum(df$amount)))