Skip to main content

listagg

Aggregate function: returns the concatenation of non-null input values, separated by the delimiter.

For the corresponding Databricks SQL function, see listagg aggregate function.

Syntax

Python
import pyspark.sql.functions as sf

sf.listagg(col=<col>)

# With delimiter
sf.listagg(col=<col>, delimiter=<delimiter>)

Parameters

Parameter

Type

Description

col

pyspark.sql.Column or str

Target column to compute on.

delimiter

pyspark.sql.Column, str, or bytes

Optional. The delimiter to separate the values. The default value is None.

Returns

pyspark.sql.Column: the column for computed results.

Examples

Example 1: Using listagg function.

Python
import pyspark.sql.functions as sf
df = spark.createDataFrame([('a',), ('b',), (None,), ('c',)], ['strings'])
df.select(sf.listagg('strings')).show()
Output
+----------------------+
|listagg(strings, NULL)|
+----------------------+
| abc|
+----------------------+

Example 2: Using listagg function with a delimiter.

Python
import pyspark.sql.functions as sf
df = spark.createDataFrame([('a',), ('b',), (None,), ('c',)], ['strings'])
df.select(sf.listagg('strings', ', ')).show()
Output
+--------------------+
|listagg(strings, , )|
+--------------------+
| a, b, c|
+--------------------+

Example 3: Using listagg function with a binary column and delimiter.

Python
import pyspark.sql.functions as sf
df = spark.createDataFrame([(b'\x01',), (b'\x02',), (None,), (b'\x03',)], ['bytes'])
df.select(sf.listagg('bytes', b'\x42')).show()
Output
+---------------------+
|listagg(bytes, X'42')|
+---------------------+
| [01 42 02 42 03]|
+---------------------+

Example 4: Using listagg function on a column with all None values.

Python
import pyspark.sql.functions as sf
from pyspark.sql.types import StructType, StructField, StringType
schema = StructType([StructField("strings", StringType(), True)])
df = spark.createDataFrame([(None,), (None,), (None,), (None,)], schema=schema)
df.select(sf.listagg('strings')).show()
Output
+----------------------+
|listagg(strings, NULL)|
+----------------------+
| NULL|
+----------------------+