aggregate function

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime

Aggregates elements in an array using a custom aggregator. This function is a synonym for reduce function.


aggregate(expr, start, merge [, finish])


  • expr: An ARRAY expression.

  • start: An initial value of any type.

  • merge: A lambda function used to aggregate the current element.

  • finish: An optional lambda function used to finalize the aggregation.


The result type matches the result type of the finish lambda function if exists or start.

Applies an expression to an initial state and all elements in the array, and reduces this to a single state. The final state is converted into the final result by applying a finish function.

The merge function takes two parameters. The first being the accumulator, the second the element to be aggregated. The accumulator and the result must be of the type of start. The optional finish function takes one parameter and returns the final result.


> SELECT aggregate(array(1, 2, 3), 0, (acc, x) -> acc + x);
> SELECT aggregate(array(1, 2, 3), 0, (acc, x) -> acc + x, acc -> acc * 10);

> SELECT aggregate(array(1, 2, 3, 4),
                   named_struct('sum', 0, 'cnt', 0),
                   (acc, x) -> named_struct('sum', acc.sum + x, 'cnt', acc.cnt + 1),
                   acc -> acc.sum / acc.cnt) AS avg