Skip to main content

flatten

Creates a single array from an array of arrays. If a structure of nested arrays is deeper than two levels, only one level of nesting is removed.

Syntax

Python
from pyspark.sql import functions as sf

sf.flatten(col)

Parameters

Parameter

Type

Description

col

pyspark.sql.Column or str

The name of the column or expression to be flattened.

Returns

pyspark.sql.Column: A new column that contains the flattened array.

Examples

Example 1: Flattening a simple nested array

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame([([[1, 2, 3], [4, 5], [6]],)], ['data'])
df.select(sf.flatten(df.data)).show()
Output
+------------------+
| flatten(data)|
+------------------+
|[1, 2, 3, 4, 5, 6]|
+------------------+

Example 2: Flattening an array with null values

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame([([None, [4, 5]],)], ['data'])
df.select(sf.flatten(df.data)).show()
Output
+-------------+
|flatten(data)|
+-------------+
| NULL|
+-------------+

Example 3: Flattening an array with more than two levels of nesting

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame([([[[1, 2], [3, 4]], [[5, 6], [7, 8]]],)], ['data'])
df.select(sf.flatten(df.data)).show(truncate=False)
Output
+--------------------------------+
|flatten(data) |
+--------------------------------+
|[[1, 2], [3, 4], [5, 6], [7, 8]]|
+--------------------------------+

Example 4: Flattening an array with mixed types

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame([([['a', 'b', 'c'], [1, 2, 3]],)], ['data'])
df.select(sf.flatten(df.data)).show()
Output
+------------------+
| flatten(data)|
+------------------+
|[a, b, c, 1, 2, 3]|
+------------------+