Skip to main content

grouping_id

Aggregate function: returns the level of grouping, equals to (grouping(c1) << (n-1)) + (grouping(c2) << (n-2)) + ... + grouping(cn).

Syntax

Python
from pyspark.sql import functions as sf

sf.grouping_id(*cols)

Parameters

Parameter

Type

Description

cols

pyspark.sql.Column or str

Columns to check for. The list of columns should match with grouping columns exactly, or empty (means all the grouping columns).

Returns

pyspark.sql.Column: returns level of the grouping it relates to.

Examples

Example 1: Get grouping ID in cube operation

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame(
[(1, "a", "a"), (3, "a", "a"), (4, "b", "c")], ["c1", "c2", "c3"])
df.cube("c2", "c3").agg(sf.grouping_id(), sf.sum("c1")).orderBy("c2", "c3").show()
Output
+----+----+-------------+-------+
| c2| c3|grouping_id()|sum(c1)|
+----+----+-------------+-------+
|NULL|NULL| 3| 8|
|NULL| a| 2| 4|
|NULL| c| 2| 4|
| a|NULL| 1| 4|
| a| a| 0| 4|
| b|NULL| 1| 4|
| b| c| 0| 4|
+----+----+-------------+-------+