regexp_extract_all function
Applies to: Databricks SQL
Databricks Runtime
Extracts all of the strings in str that match the regexp expression and correspond to the regex group index.
Syntax
regexp_extract_all(str, regexp [, idx] )
Arguments
str: ASTRINGexpression to be matched.regexp: ASTRINGexpression with a matching pattern.idx: An optional integral number expression greater or equal 0 with default 1.
Returns
An ARRAY<STRING>.
The regexp string must be a regular expression. See Regular expressions for the supported syntax.
When using literals, use raw-literal (r prefix) to avoid escape character pre-processing.
regexp may contain multiple groups.
idx indicates which regex group to extract.
An idx of 0 means match the entire regular expression.
Common error conditions
Examples
Extract a capture group from every match
idx selects which capture group to return from each match.
SQL
> SELECT regexp_extract_all('100-200, 300-400', r'(\d+)-(\d+)', 1);
[100, 300]
> SELECT regexp_extract_all('100-200, 300-400', r'(\d+)-(\d+)', 2);
[200, 400]
Extract every number in a string
SQL
> SELECT regexp_extract_all('There are 10 cats, 20 dogs, and 30 birds', r'(\d+)', 1);
[10, 20, 30]
Extract every hashtag
SQL
> SELECT regexp_extract_all('Posts tagged #data #ai #sql', r'#(\w+)', 1);
[data, ai, sql]
Extract every match of a group-less pattern
When the pattern has no capture group, use an idx of 0 to return the whole match.
SQL
> SELECT regexp_extract_all('Send to alice@example.com or bob@example.org', r'\w+@\w+\.\w+', 0);
[alice@example.com, bob@example.org]
Invalid regex pattern
SQL
> SELECT regexp_extract_all('abc', '[invalid', 0);
Error: INVALID_PARAMETER_VALUE.PATTERN