Skip to main content

parse_url

Extracts a specified part from a URL. If a key is provided, it returns the associated query parameter value.

Syntax

Python
from pyspark.sql import functions as sf

sf.parse_url(url, partToExtract, key=None)

Parameters

Parameter

Type

Description

url

pyspark.sql.Column or str

A column of strings, each representing a URL.

partToExtract

pyspark.sql.Column or str

A column of strings, each representing the part to extract from the URL.

key

pyspark.sql.Column or str, optional

A column of strings, each representing the key of a query parameter in the URL.

Returns

pyspark.sql.Column: A new column of strings, each representing the value of the extracted part from the URL.

Examples

Example 1: Extracting the query part from a URL

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame(
[("https://spark.apache.org/path?query=1", "QUERY")],
["url", "part"]
)
df.select(sf.parse_url(df.url, df.part)).show()
Output
+--------------------+
|parse_url(url, part)|
+--------------------+
| query=1|
+--------------------+

Example 2: Extracting the value of a specific query parameter from a URL

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame(
[("https://spark.apache.org/path?query=1", "QUERY", "query")],
["url", "part", "key"]
)
df.select(sf.parse_url(df.url, df.part, df.key)).show()
Output
+-------------------------+
|parse_url(url, part, key)|
+-------------------------+
| 1|
+-------------------------+

Example 3: Extracting the protocol part from a URL

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame(
[("https://spark.apache.org/path?query=1", "PROTOCOL")],
["url", "part"]
)
df.select(sf.parse_url(df.url, df.part)).show()
Output
+--------------------+
|parse_url(url, part)|
+--------------------+
| https|
+--------------------+

Example 4: Extracting the host part from a URL

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame(
[("https://spark.apache.org/path?query=1", "HOST")],
["url", "part"]
)
df.select(sf.parse_url(df.url, df.part)).show()
Output
+--------------------+
|parse_url(url, part)|
+--------------------+
| spark.apache.org|
+--------------------+

Example 5: Extracting the path part from a URL

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame(
[("https://spark.apache.org/path?query=1", "PATH")],
["url", "part"]
)
df.select(sf.parse_url(df.url, df.part)).show()
Output
+--------------------+
|parse_url(url, part)|
+--------------------+
| /path|
+--------------------+