Skip to main content

string_agg

Aggregate function: returns the concatenation of non-null input values, separated by the delimiter. An alias of listagg.

Syntax

Python
from pyspark.sql import functions as sf

sf.string_agg(col, delimiter=None)

Parameters

Parameter

Type

Description

col

pyspark.sql.Column or str

Target column to compute on.

delimiter

pyspark.sql.Column, str or bytes, optional

The delimiter to separate the values. The default value is None.

Returns

pyspark.sql.Column: the column for computed results.

Examples

Example 1: Using string_agg function

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame([('a',), ('b',), (None,), ('c',)], ['strings'])
df.select(sf.string_agg('strings')).show()
Output
+-------------------------+
|string_agg(strings, NULL)|
+-------------------------+
| abc|
+-------------------------+

Example 2: Using string_agg function with a delimiter

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame([('a',), ('b',), (None,), ('c',)], ['strings'])
df.select(sf.string_agg('strings', ', ')).show()
Output
+-----------------------+
|string_agg(strings, , )|
+-----------------------+
| a, b, c|
+-----------------------+

Example 3: Using string_agg function with a binary column and delimiter

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame([(b'\x01',), (b'\x02',), (None,), (b'\x03',)], ['bytes'])
df.select(sf.string_agg('bytes', b'\x42')).show()
Output
+------------------------+
|string_agg(bytes, X'42')|
+------------------------+
| [01 42 02 42 03]|
+------------------------+