groupByΒΆ

The groupBy function groups your sparkDataFrame on the specified column(s) so that you can run some aggregations on them, much like your SQL GROUP BY statement. The groupBy function outputs a GroupedData object which can then be passed into aggregation functions.

Syntax:

  • groupBy(SparkDataFrame, “columnName”)

Parameters:

  • SparkDataFrame: Any SparkDataFrame
  • columnName: String

Output:

  • GroupedData Object
require(SparkR)

# Create SparkDataFrame
df <- createDataFrame(faithful)
head(df)
# Group data by eruptions
dfGroup <- groupBy(df, "eruptions")
dfGroup

Notice that we aren’t able to do much with the GroupedData at this point - we can’t print or view the data. The only way to use the GroupedData is to pass it into aggregate functions, such as agg or avg.