Skip to main content

regexp_substr

Returns the first substring that matches the Java regex regexp within the string str. If the regular expression is not found, the result is null.

For the corresponding Databricks SQL function, see regexp_substr function.

Syntax

Python
from pyspark.databricks.sql import functions as dbf

dbf.regexp_substr(str=<str>, regexp=<regexp>)

Parameters

Parameter

Type

Description

str

pyspark.sql.Column or str

target column to work on.

regexp

pyspark.sql.Column or str

regex pattern to apply.

Examples

Python
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([("1a 2b 14m", r"\d+")], ["str", "regexp"])
Python
df.select('*', dbf.regexp_substr('str', dbf.lit(r'\d+'))).show()
df.select('*', dbf.regexp_substr('str', dbf.lit(r'mmm'))).show()
df.select('*', dbf.regexp_substr("str", dbf.col("regexp"))).show()
df.select('*', dbf.regexp_substr(dbf.col("str"), "regexp")).show()