groupingSets

Create multi-dimensional aggregation for the current DataFrame using the specified grouping sets, so we can run aggregation on them.

Syntax

groupingSets(groupingSets: Sequence[Sequence["ColumnOrName"]], *cols: "ColumnOrName")

Parameters

Parameter	Type	Description
`groupingSets`	sequence of sequence of columns or str	Individual set of columns to group on.
`cols`	Column or str	Additional grouping columns specified by users. Those columns are shown as the output columns after aggregation.

Parameter	Type	Description
`groupingSets`	sequence of sequence of columns or str	Individual set of columns to group on.
`cols`	Column or str	Additional grouping columns specified by users. Those columns are shown as the output columns after aggregation.

Returns

GroupedData: Grouping sets of the data based on the specified columns.

Examples

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame([
    (100, 'Fremont', 'Honda Civic', 10),
    (100, 'Fremont', 'Honda Accord', 15),
    (100, 'Fremont', 'Honda CRV', 7),
    (200, 'Dublin', 'Honda Civic', 20),
    (200, 'Dublin', 'Honda Accord', 10),
    (200, 'Dublin', 'Honda CRV', 3),
    (300, 'San Jose', 'Honda Civic', 5),
    (300, 'San Jose', 'Honda Accord', 8)
], schema="id INT, city STRING, car_model STRING, quantity INT")

df.groupingSets(
    [("city", "car_model"), ("city",), ()],
    "city", "car_model"
).agg(sf.sum(sf.col("quantity")).alias("sum")).sort("city", "car_model").show()
# +--------+------------+---+
# |    city|   car_model|sum|
# +--------+------------+---+
# |    NULL|        NULL| 78|
# |  Dublin|        NULL| 33|
# |  Dublin|Honda Accord| 10|
# |  Dublin|   Honda CRV|  3|
# |  Dublin| Honda Civic| 20|
# | Fremont|        NULL| 32|
# | Fremont|Honda Accord| 15|
# | Fremont|   Honda CRV|  7|
# | Fremont| Honda Civic| 10|
# |San Jose|        NULL| 13|
# |San Jose|Honda Accord|  8|
# |San Jose| Honda Civic|  5|
# +--------+------------+---+

Syntax​

Parameters​

Returns​

Examples​

Syntax

Parameters

Returns

Examples