Skip to main content

stack

Separates col1, ..., colk into n rows. Uses column names col0, col1, etc. by default unless specified otherwise.

Syntax

Python
from pyspark.sql import functions as sf

sf.stack(*cols)

Parameters

Parameter

Type

Description

cols

pyspark.sql.Column or column name

The first element should be a literal int for the number of rows to be separated, and the remaining are input elements to be separated.

Examples

Example 1: Stack with 2 rows

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame([(1, 2, 3)], ['a', 'b', 'c'])
df.select('*', sf.stack(sf.lit(2), df.a, df.b, 'c')).show()
Output
+---+---+---+----+----+
| a| b| c|col0|col1|
+---+---+---+----+----+
| 1| 2| 3| 1| 2|
| 1| 2| 3| 3|NULL|
+---+---+---+----+----+

Example 2: Stack with alias

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame([(1, 2, 3)], ['a', 'b', 'c'])
df.select('*', sf.stack(sf.lit(2), df.a, df.b, 'c').alias('x', 'y')).show()
Output
+---+---+---+---+----+
| a| b| c| x| y|
+---+---+---+---+----+
| 1| 2| 3| 1| 2|
| 1| 2| 3| 3|NULL|
+---+---+---+---+----+

Example 3: Stack with 3 rows

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame([(1, 2, 3)], ['a', 'b', 'c'])
df.select('*', sf.stack(sf.lit(3), df.a, df.b, 'c')).show()
Output
+---+---+---+----+
| a| b| c|col0|
+---+---+---+----+
| 1| 2| 3| 1|
| 1| 2| 3| 2|
| 1| 2| 3| 3|
+---+---+---+----+

Example 4: Stack with 4 rows

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame([(1, 2, 3)], ['a', 'b', 'c'])
df.select('*', sf.stack(sf.lit(4), df.a, df.b, 'c')).show()
Output
+---+---+---+----+
| a| b| c|col0|
+---+---+---+----+
| 1| 2| 3| 1|
| 1| 2| 3| 2|
| 1| 2| 3| 3|
| 1| 2| 3|NULL|
+---+---+---+----+