Skip to main content

xpath

Returns a string array of values within the nodes of xml that match the XPath expression.

Syntax

Python
from pyspark.sql import functions as sf

sf.xpath(xml, path)

Parameters

Parameter

Type

Description

xml

pyspark.sql.Column or str

XML column or column name.

path

pyspark.sql.Column or str

XPath expression.

Examples

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame(
[('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>',)], ['x'])
df.select(sf.xpath(df.x, sf.lit('a/b/text()'))).show()
Output
+--------------------+
|xpath(x, a/b/text())|
+--------------------+
| [b1, b2, b3]|
+--------------------+