The agg function allows you to perform aggregations on your SparkDataFrame, and returns a new column with the calculated output.


  • agg(data, colName = “aggFunction”)
  • agg(data, newColName = aggFunction(SparkDataFrame$colName))


  • data: Any SparkDataFrame or GroupedData
  • colName: Column in SparkDataFrame
  • newColName: Desired Column Name
  • aggFunction: sum, mean, avg, max, min


  • SparkDataFrame
# Create SparkDataFrame
df <- createDataFrame(faithful)

# Use agg to sum total waiting times
head(agg(df, totalWaiting = sum(df$waiting)))

We can also use agg on grouped data.

# Group data by eruptions
dfGroup <- groupBy(df, "eruptions")

# As we didn't specify a newColName, agg() will create a column named aggFunction(colName)
head(agg(dfGroup, waiting="sum"))
# You can specify a newColName for aggregated columns
head(agg(dfGroup, totalWaiting = sum(df$waiting)))