Skip to main content

array_join

Returns a string column by concatenating the elements of the input array column using the delimiter. Null values within the array can be replaced with a specified string through the null_replacement argument. If null_replacement is not set, null values are ignored.

Syntax

Python
from pyspark.sql import functions as sf

sf.array_join(col, delimiter, null_replacement=None)

Parameters

Parameter

Type

Description

col

pyspark.sql.Column or str

The input column containing the arrays to be joined.

delimiter

str

The string to be used as the delimiter when joining the array elements.

null_replacement

str, optional

The string to replace null values within the array. If not set, null values are ignored.

Returns

pyspark.sql.Column: A new column of string type, where each value is the result of joining the corresponding array from the input column.

Examples

Example 1: Basic usage of array_join function.

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame([(["a", "b", "c"],), (["a", "b"],)], ['data'])
df.select(sf.array_join(df.data, ",")).show()
Output
+-------------------+
|array_join(data, ,)|
+-------------------+
| a,b,c|
| a,b|
+-------------------+

Example 2: Usage of array_join function with null_replacement argument.

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame([(["a", None, "c"],)], ['data'])
df.select(sf.array_join(df.data, ",", "NULL")).show()
Output
+-------------------------+
|array_join(data, ,, NULL)|
+-------------------------+
| a,NULL,c|
+-------------------------+

Example 3: Usage of array_join function without null_replacement argument.

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame([(["a", None, "c"],)], ['data'])
df.select(sf.array_join(df.data, ",")).show()
Output
+-------------------+
|array_join(data, ,)|
+-------------------+
| a,c|
+-------------------+

Example 4: Usage of array_join function with an array that is null.

Python
from pyspark.sql import functions as sf
from pyspark.sql.types import StructType, StructField, ArrayType, StringType
schema = StructType([StructField("data", ArrayType(StringType()), True)])
df = spark.createDataFrame([(None,)], schema)
df.select(sf.array_join(df.data, ",")).show()
Output
+-------------------+
|array_join(data, ,)|
+-------------------+
| NULL|
+-------------------+

Example 5: Usage of array_join function with an array containing only null values.

Python
from pyspark.sql import functions as sf
from pyspark.sql.types import StructType, StructField, ArrayType, StringType
schema = StructType([StructField("data", ArrayType(StringType()), True)])
df = spark.createDataFrame([([None, None],)], schema)
df.select(sf.array_join(df.data, ",", "NULL")).show()
Output
+-------------------------+
|array_join(data, ,, NULL)|
+-------------------------+
| NULL,NULL|
+-------------------------+