Skip to main content

regexp_extract

Extract a specific group matched by the Java regex regexp, from the specified string column. If the regex did not match, or the specified group did not match, an empty string is returned.

For the corresponding Databricks SQL function, see regexp_extract function.

Syntax

Python
from pyspark.databricks.sql import functions as dbf

dbf.regexp_extract(str=<str>, pattern=<pattern>, idx=<idx>)

Parameters

Parameter

Type

Description

str

pyspark.sql.Column or str

target column to work on.

pattern

str

regex pattern to apply.

idx

int

matched group id.

Examples

Python
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([('100-200',)], ['str'])
df.select('*', dbf.regexp_extract('str', r'(\d+)-(\d+)', 1)).show()
df = spark.createDataFrame([('foo',)], ['str'])
df.select('*', dbf.regexp_extract('str', r'(\d+)', 1)).show()
df = spark.createDataFrame([('aaaac',)], ['str'])
df.select('*', dbf.regexp_extract(dbf.col('str'), '(a+)(b)?(c)', 2)).show()