unpivot
Unpivot a DataFrame from wide format to long format, optionally leaving identifier columns set. This is the reverse to groupBy(...).pivot(...).agg(...), except for the aggregation, which cannot be reversed.
Added in Databricks Runtime 11.1
Syntax
unpivot(ids: Union["ColumnOrName", List["ColumnOrName"], Tuple["ColumnOrName", ...]], values: Optional[Union["ColumnOrName", List["ColumnOrName"], Tuple["ColumnOrName", ...]]], variableColumnName: str, valueColumnName: str)
Parameters
Parameter | Type | Description |
|---|---|---|
| str, Column, tuple, list | Column(s) to use as identifiers. Can be a single column or column name, or a list or tuple for multiple columns. |
| str, Column, tuple, list, optional | Column(s) to unpivot. Can be a single column or column name, or a list or tuple for multiple columns. If specified, must not be empty. If not specified, uses all columns that are not set as |
| str | Name of the variable column. |
| str | Name of the value column. |
Returns
DataFrame: Unpivoted DataFrame.
Notes
Supports Spark Connect.
Examples
df = spark.createDataFrame(
[(1, 11, 1.1), (2, 12, 1.2)],
["id", "int", "double"],
)
df.show()
# +---+---+------+
# | id|int|double|
# +---+---+------+
# | 1| 11| 1.1|
# | 2| 12| 1.2|
# +---+---+------+
from pyspark.sql import functions as sf
df.unpivot(
"id", ["int", "double"], "var", "val"
).sort("id", sf.desc("var")).show()
# +---+------+----+
# | id| var| val|
# +---+------+----+
# | 1| int|11.0|
# | 1|double| 1.1|
# | 2| int|12.0|
# | 2|double| 1.2|
# +---+------+----+