MISSING_AGGREGATION error condition

The non-aggregating expression <expression> is based on columns which are not participating in the GROUP BY clause.

Add the columns or the expression to the GROUP BY, aggregate the expression, or use <expressionAnyValue> if you do not care which of the values within a group is returned.

Parameters

expression: Non aggregating, non grouping expression in the SELECT list.
expressionAnyValue: expression wrapped in an any_value() aggregate function.

Explanation

Within the context of a query with a GROUP BY clause, the local column-references in the SELECT list must be:

Consumed as an argument to an aggregate function, or
Part of an expression which matches an expression on the GROUP BY clause.

A local column reference is a column that has been resolved to a table-reference in the query's FROM clause.

In other words: Column-references must either be part of the grouping keys, or they must be part of the aggregation.

Databricks matches expressions on best effort: For example it will recognize: SELECT c1 + 5 FROM T GROUP BY 5 + c1 as mathing expressions. But SELECT c1 FROM T GROUP BY c1 + 5 is not a match.

Mitigation

The mitigation of the error depends on the cause:

Did you miss a grouping column?

Add expression, or the relevant subexpression of expression to the GROUP BY clause.
Is the column reference part of a GROUP BY expression which differs from epression?

Match the expression in the SELECT list or simplify the GROUP BY expression.
Are you missing the aggregation?

Wrap the column reference with an aggregate function. If you only want a representative value from the group, you can use any_value(epression).

Examples

SQL
-- Sample data
> CREATE OR REPLACE TEMPORARY VIEW tasks(name, firstname, task, cost) AS
   VALUES ('Smith'  , 'Sam'   , 'UNPIVOT', 10),
          ('Smith'  , 'Sam'   , 'LATERAL',  5),
          ('Shuster', 'Sally' , 'DELETE' ,  7),
          ('Shuster', 'Sally' , 'GRANT'  ,  8);

-- `name` and `firstname` are part of the group by coumns, but incomplete
> SELECT name, firstname, sum(cost) FROM tasks GROUP BY firstname || ' ' || name;
 [MISSING_AGGREGATION] The expression "name" is neither present in the group by, nor is it an aggregate function.

-- Match the GROUP BY expression
> SELECT firstname || ' ' || name, sum(cost) FROM tasks GROUP BY firstname || ' ' || name;
 Sam Smith      15
 Sally Shuster  15

-- Break up the GROUP BY expression
> SELECT firstname, name, sum(cost) FROM tasks GROUP BY firstname, name;
 Sam    Smith   15
 Sally  Shuster 15

-- Missing grouping column
> SELECT name, firstname, sum(cost) FROM tasks GROUP BY name;
 [MISSING_AGGREGATION] The expression "firstname" is neither present in the group by, nor is it an aggregate function.

-- Add the grouping column
> SELECT firstname, name, sum(cost) FROM tasks GROUP BY firstname, name;
 Sam    Smith   15
 Sally  Shuster 15

-- Missing aggregate
> SELECT firstname, name, sum(cost), task FROM tasks GROUP BY firstname, name;
 [MISSING_AGGREGATION] The expression "task" is neither present in the group by, nor is it an aggregate function.

-- Add an aggregate
> SELECT firstname, name, sum(cost), array_agg(task) FROM tasks GROUP BY firstname, name;
 Sam    Smith   15  ["UNPIVOT","LATERAL"]
 Sally  Shuster 15  ["DELETE","GRANT"]

-- Return any task
> SELECT firstname, name, sum(cost), any_value(task) FROM tasks GROUP BY firstname, name;
 Sam    Smith   15  LATERAL
 Sally  Shuster 15  DELETE

Parameters​

Explanation​

Mitigation​

Examples​

Related​

Parameters

Explanation

Mitigation

Examples

Related