groupBy

The groupBy function groups your sparkR DataFrame on the specified column(s) so that you can run some aggregations on them, much like your SQL GROUP BY statement. The groupBy function outputs a GroupedData object which can then be passed into aggregation functions.

Syntax:

  • groupBy(DataFrame, “columnName”)

Parameters:

  • DataFrame: Any SparkR DataFrame
  • columnName: String

Output:

  • GroupedData Object
# Create SparkR DataFrame
df <- createDataFrame(sqlContext, faithful)
head(df)
# Group data by eruptions
dfGroup <- groupBy(df, "eruptions")
dfGroup

Notice that we aren’t able to do much with the GroupedData at this point - we can’t print or view the data. The only way to use the GroupedData is to pass it into aggregate functions, such as agg or avg.