Skip to main content

split

Splits str around matches of the given pattern.

For the corresponding Databricks SQL function, see split function.

Syntax

Python
from pyspark.databricks.sql import functions as dbf

dbf.split(str=<str>, pattern=<pattern>, limit=<limit>)

Parameters

Parameter

Type

Description

str

pyspark.sql.Column or str

a string expression to split

pattern

pyspark.sql.Column or literal string

a string representing a regular expression. The regex string should be a Java regular expression. accepted as a regular expression representation, for backwards compatibility. In addition to int, limit now accepts column and column name.

limit

pyspark.sql.Column or str or int

an integer which controls the number of times pattern is applied. _ limit > 0: The resulting array's length will not be more than limit, and the resulting array's last entry will contain all input beyond the last matched pattern. _ limit <= 0: pattern will be applied as many times as possible, and the resulting array can be of any size.

Returns

pyspark.sql.Column: array of separated strings.

Examples

Python
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([('oneAtwoBthreeC',)], ['s',])
df.select('*', dbf.split(df.s, '[ABC]')).show()
df.select('*', dbf.split(df.s, '[ABC]', 2)).show()
df.select('*', dbf.split('s', '[ABC]', -2)).show()
df = spark.createDataFrame([
('oneAtwoBthreeC', '[ABC]', 2),
('1A2B3C', '[1-9]+', 1),
('aa2bb3cc4', '[1-9]+', -1)], ['s', 'p', 'l'])
df.select('*', dbf.split(df.s, df.p)).show()
df.select(dbf.split('s', df.p, 'l')).show()