Skip to main content

split function

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime

Splits str around occurrences that match regex and returns an array with a length of at most limit.

Syntax

split(str, regex [, limit] )

Arguments

  • str: A STRING expression to be split.
  • regexp: A STRING expression that is a regular expression that splits str. See Regular expressions for the supported syntax.
  • limit: An optional INTEGER expression defaulting to 0 (no limit).

Returns

An ARRAY<STRING>.

If limit > 0: The resulting array's length will not be more than limit, and the resulting array's last entry will contain all input beyond the last matched regex.

If limit <= 0: regex will be applied as many times as possible, and the resulting array can be of any size.

Examples

Split on a set of delimiter characters

The character class [ABC] matches any of A, B, or C as a delimiter. The trailing empty element comes from the delimiter at the end of the string.

SQL
> SELECT split('oneAtwoBthreeC', '[ABC]');
[one,two,three,]

Split a comma-separated string

SQL
> SELECT split('apple,banana,cherry', ',');
[apple,banana,cherry]

Split on one or more whitespace characters

Use \s+ so that a run of spaces or tabs acts as a single delimiter.

SQL
> SELECT split('the   quick  brown', r'\s+');
[the,quick,brown]

Limit the number of elements

When limit is greater than 0, the final element holds the remainder of the string.

SQL
> SELECT split('oneAtwoBthreeC', '[ABC]', 2);
[one,twoBthreeC]

> SELECT split('oneAtwoBthreeC', '[ABC]', -1);
[one,two,three,]

Collation affects matching

The lowercase pattern [abc] matches the uppercase delimiters only under a case-insensitive collation.

SQL
> SELECT split('oneAtwoBthreeC' COLLATE UTF8_BINARY, '[abc]');
[oneAtwoBthreeC]

> SELECT split('oneAtwoBthreeC' COLLATE UTF8_LCASE, '[abc]');
[one,two,three,]