distinct

distinct returns a new DataFrame consisting of unique rows from the specified DataFrame.

Syntax:

  • distinct(df)

Parameters:

  • df: Any SparkR DataFrame

Output:

  • SparkR DataFrame
# Create SparkR DataFrame
df <- createDataFrame(sqlContext, data.frame(customer = c("James", "Peter", "Jane", "James"),
                                             amount = c(5, 5, 6, 5)))
head(df)
# Create new DataFrame with distinct rows
# Dupe record for James is dropped
newdf <- distinct(df)
head(newdf)
# We can also view distinct values of a particular column.
# This is done by creating a subset of the original df
head(distinct(df[,"customer"]))