selectΒΆ

The select function selects the specified columns and returns it as a new SparkDataFrame. This is similar to the SELECT statement in SQL.

Syntax:

  • select(df, “col”, ...)
  • select(df, df$col)

Parameters:

  • df: Any SparkDataFrame
  • col: Column in SparkDataFrame

Output:

  • SparkDataFrame
# Create SparkDataFrame
require(SparkR)
df <- createDataFrame(airquality)
head(df)

Since select returns a SparkDataFrame, we will need to use functions like head, take or collect to view the resulting SparkDataFrame.

head(select(df, "Ozone"))
# Alternative R-like syntax for indicating df columns
head(select(df, df$Ozone))
# We're also able to select multiple columns
head(select(df, "Ozone", "Wind"))

select is useful for reading the column objects returned by other functions.

countDistinct(df$Ozone)
# Use select() to read Column COUNT(DISTINCT Ozone)
head(select(df, countDistinct(df$Ozone)))