`split` function

Applies to: Databricks SQL Databricks Runtime

Splits str around occurrences that match regex and returns an array with a length of at most limit.

Syntax

split(str, regex [, limit] )

Arguments

str: A STRING expression to be split.
regexp: A STRING expression that is a regular expression that splits str. See Regular expressions for the supported syntax.
limit: An optional INTEGER expression defaulting to 0 (no limit).

Returns

An ARRAY<STRING>.

If limit > 0: The resulting array's length will not be more than limit, and the resulting array's last entry will contain all input beyond the last matched regex.

If limit <= 0: regex will be applied as many times as possible, and the resulting array can be of any size.

Examples

Split on a set of delimiter characters

The character class [ABC] matches any of A, B, or C as a delimiter. The trailing empty element comes from the delimiter at the end of the string.

SQL
> SELECT split('oneAtwoBthreeC', '[ABC]');
 [one,two,three,]

Split a comma-separated string

SQL
> SELECT split('apple,banana,cherry', ',');
 [apple,banana,cherry]

Split on one or more whitespace characters

Use \s+ so that a run of spaces or tabs acts as a single delimiter.

SQL
> SELECT split('the   quick  brown', r'\s+');
 [the,quick,brown]

Limit the number of elements

When limit is greater than 0, the final element holds the remainder of the string.

SQL
> SELECT split('oneAtwoBthreeC', '[ABC]', 2);
 [one,twoBthreeC]

> SELECT split('oneAtwoBthreeC', '[ABC]', -1);
 [one,two,three,]

Collation affects matching

The lowercase pattern [abc] matches the uppercase delimiters only under a case-insensitive collation.

SQL
> SELECT split('oneAtwoBthreeC' COLLATE UTF8_BINARY, '[abc]');
 [oneAtwoBthreeC]

> SELECT split('oneAtwoBthreeC' COLLATE UTF8_LCASE, '[abc]');
 [one,two,three,]

Syntax​

Arguments​

Returns​

Examples​

Split on a set of delimiter characters​

Split a comma-separated string​

Split on one or more whitespace characters​

Limit the number of elements​

Collation affects matching​

Related functions​