Skip to main content

regexp_extract_all function

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime

Extracts all of the strings in str that match the regexp expression and correspond to the regex group index.

Syntax

regexp_extract_all(str, regexp [, idx] )

Arguments

  • str: A STRING expression to be matched.
  • regexp: A STRING expression with a matching pattern.
  • idx: An optional integral number expression greater or equal 0 with default 1.

Returns

An ARRAY<STRING>.

The regexp string must be a regular expression. See Regular expressions for the supported syntax.

When using literals, use raw-literal (r prefix) to avoid escape character pre-processing.

regexp may contain multiple groups. idx indicates which regex group to extract. An idx of 0 means match the entire regular expression.

Common error conditions

Examples

Extract a capture group from every match

idx selects which capture group to return from each match.

SQL
> SELECT regexp_extract_all('100-200, 300-400', r'(\d+)-(\d+)', 1);
[100, 300]

> SELECT regexp_extract_all('100-200, 300-400', r'(\d+)-(\d+)', 2);
[200, 400]

Extract every number in a string

SQL
> SELECT regexp_extract_all('There are 10 cats, 20 dogs, and 30 birds', r'(\d+)', 1);
[10, 20, 30]

Extract every hashtag

SQL
> SELECT regexp_extract_all('Posts tagged #data #ai #sql', r'#(\w+)', 1);
[data, ai, sql]

Extract every match of a group-less pattern

When the pattern has no capture group, use an idx of 0 to return the whole match.

SQL
> SELECT regexp_extract_all('Send to alice@example.com or bob@example.org', r'\w+@\w+\.\w+', 0);
[alice@example.com, bob@example.org]

Invalid regex pattern

SQL
> SELECT regexp_extract_all('abc', '[invalid', 0);
Error: INVALID_PARAMETER_VALUE.PATTERN