Skip to main content

regexp_extract_all

Extract all strings in the str that match the Java regex regexp and corresponding to the regex group index.

For the corresponding Databricks SQL function, see regexp_extract_all function.

Syntax

Python
from pyspark.databricks.sql import functions as dbf

dbf.regexp_extract_all(str=<str>, regexp=<regexp>, idx=<idx>)

Parameters

Parameter

Type

Description

str

pyspark.sql.Column or str

target column to work on.

regexp

pyspark.sql.Column or str

regex pattern to apply.

idx

pyspark.sql.Column or int, optional

matched group id.

Examples

Python
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([("100-200, 300-400", r"(\d+)-(\d+)")], ["str", "regexp"])
df.select('*', dbf.regexp_extract_all('str', dbf.lit(r'(\d+)-(\d+)'))).show()
df.select('*', dbf.regexp_extract_all('str', dbf.lit(r'(\d+)-(\d+)'), dbf.lit(1))).show()
df.select('*', dbf.regexp_extract_all('str', dbf.lit(r'(\d+)-(\d+)'), 2)).show()
df.select('*', dbf.regexp_extract_all('str', dbf.col("regexp"))).show()
df.select('*', dbf.regexp_extract_all(dbf.col('str'), "regexp")).show()