distinctΒΆ

distinct returns a new SparkDataFrame consisting of unique rows from the specified SparkDataFrame.

Syntax:

  • distinct(df)

Parameters:

  • df: Any SparkDataFrame

Output:

  • SparkDataFrame
require(SparkR)

# Create SparkDataFrame
df <- createDataFrame(data.frame(customer = c("James", "Peter", "Jane", "James"),
                                             amount = c(5, 5, 6, 5)))
head(df)
# Create new SparkDataFrame with distinct rows
# Dupe record for James is dropped
newdf <- distinct(df)
head(newdf)
# We can also view distinct values of a particular column.
# This is done by creating a subset of the original df
head(distinct(df[,"customer"]))