split function
Applies to: Databricks SQL
Databricks Runtime
Splits str around occurrences that match regex and returns an array with a length of at most limit.
Syntax
split(str, regex [, limit] )
Arguments
str: ASTRINGexpression to be split.regexp: ASTRINGexpression that is a regular expression that splitsstr. See Regular expressions for the supported syntax.limit: An optionalINTEGERexpression defaulting to 0 (no limit).
Returns
An ARRAY<STRING>.
If limit > 0: The resulting array's length will not be more than limit, and the resulting array's last entry will contain all input beyond the last matched regex.
If limit <= 0: regex will be applied as many times as possible, and the resulting array can be of any size.
Examples
Split on a set of delimiter characters
The character class [ABC] matches any of A, B, or C as a delimiter. The trailing empty element comes from the delimiter at the end of the string.
> SELECT split('oneAtwoBthreeC', '[ABC]');
[one,two,three,]
Split a comma-separated string
> SELECT split('apple,banana,cherry', ',');
[apple,banana,cherry]
Split on one or more whitespace characters
Use \s+ so that a run of spaces or tabs acts as a single delimiter.
> SELECT split('the quick brown', r'\s+');
[the,quick,brown]
Limit the number of elements
When limit is greater than 0, the final element holds the remainder of the string.
> SELECT split('oneAtwoBthreeC', '[ABC]', 2);
[one,twoBthreeC]
> SELECT split('oneAtwoBthreeC', '[ABC]', -1);
[one,two,three,]
Collation affects matching
The lowercase pattern [abc] matches the uppercase delimiters only under a case-insensitive collation.
> SELECT split('oneAtwoBthreeC' COLLATE UTF8_BINARY, '[abc]');
[oneAtwoBthreeC]
> SELECT split('oneAtwoBthreeC' COLLATE UTF8_LCASE, '[abc]');
[one,two,three,]