grouping_id
Aggregate function: returns the level of grouping, equals to (grouping(c1) << (n-1)) + (grouping(c2) << (n-2)) + ... + grouping(cn).
Syntax
Python
from pyspark.sql import functions as sf
sf.grouping_id(*cols)
Parameters
Parameter | Type | Description |
|---|---|---|
|
| Columns to check for. The list of columns should match with grouping columns exactly, or empty (means all the grouping columns). |
Returns
pyspark.sql.Column: returns level of the grouping it relates to.
Examples
Example 1: Get grouping ID in cube operation
Python
from pyspark.sql import functions as sf
df = spark.createDataFrame(
[(1, "a", "a"), (3, "a", "a"), (4, "b", "c")], ["c1", "c2", "c3"])
df.cube("c2", "c3").agg(sf.grouping_id(), sf.sum("c1")).orderBy("c2", "c3").show()
Output
+----+----+-------------+-------+
| c2| c3|grouping_id()|sum(c1)|
+----+----+-------------+-------+
|NULL|NULL| 3| 8|
|NULL| a| 2| 4|
|NULL| c| 2| 4|
| a|NULL| 1| 4|
| a| a| 0| 4|
| b|NULL| 1| 4|
| b| c| 0| 4|
+----+----+-------------+-------+