regexp_substr
Returns the first substring that matches the Java regex regexp within the string str. If the regular expression is not found, the result is null.
For the corresponding Databricks SQL function, see regexp_substr function.
Syntax
Python
from pyspark.databricks.sql import functions as dbf
dbf.regexp_substr(str=<str>, regexp=<regexp>)
Parameters
Parameter | Type | Description |
|---|---|---|
|
| target column to work on. |
|
| regex pattern to apply. |
Examples
Python
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([("1a 2b 14m", r"\d+")], ["str", "regexp"])
Python
df.select('*', dbf.regexp_substr('str', dbf.lit(r'\d+'))).show()
df.select('*', dbf.regexp_substr('str', dbf.lit(r'mmm'))).show()
df.select('*', dbf.regexp_substr("str", dbf.col("regexp"))).show()
df.select('*', dbf.regexp_substr(dbf.col("str"), "regexp")).show()