agg

The agg function allows you to perform aggregations on your DataFrame, and returns a new column with the calculated output.

Syntax:

  • agg(data, colName = “aggFunction”)
  • agg(data, newColName = aggFunction(DataFrame$colName))

Parameters:

  • data: Any SparkR DataFrame or GroupedData
  • colName: Column in DataFrame
  • newColName: Desired Column Name
  • aggFunction: sum, mean, avg, max, min

Output:

  • SparkR DataFrame
# Create SparkR DataFrame
df <- createDataFrame(sqlContext, faithful)

# Use agg to sum total waiting times
head(agg(df, totalWaiting = sum(df$waiting)))

We can also use agg on grouped data.

# Group data by eruptions
dfGroup <- groupBy(df, "eruptions")

# As we didn't specify a newColName, agg() will create a column named aggFunction(colName)
head(agg(dfGroup, waiting="sum"))
# You can specify a newColName for aggregated columns
head(agg(dfGroup, totalWaiting = sum(df$waiting)))