from_xml

Analisa uma coluna contendo strings XML e a converte em uma linha com o esquema especificado. Retorna null, no caso de strings não analisáveis.

Sintaxe

Python
from pyspark.sql import functions as sf

sf.from_xml(col, schema, options=None)

Parâmetros

Parâmetro	Tipo	Descrição
`col`	`pyspark.sql.Column` ou str	Uma coluna ou nome de coluna em formato XML.
`schema`	`StructType`, `pyspark.sql.Column` ou str	Um StructType, Column ou literal de strings Python com strings formatadas em DDL para usar ao analisar a coluna XML.
`options`	dicionário, opcional	Opções para controlar a análise sintática. Aceita as mesmas opções que a fonte de dados XML.

Devoluções

pyspark.sql.Column: uma nova coluna de tipo complexo a partir de um objeto XML fornecido.

Exemplos

Exemplo 1 : Analisando XML com um esquema de strings formatadas em DDL

Python
import pyspark.sql.functions as sf
data = [(1, '''<p><a>1</a></p>''')]
df = spark.createDataFrame(data, ("key", "value"))
# Define the schema using a DDL-formatted string
schema = "STRUCT<a: BIGINT>"
# Parse the XML column using the DDL-formatted schema
df.select(sf.from_xml(df.value, schema).alias("xml")).collect()

Output
[Row(xml=Row(a=1))]

Exemplo 2 : Analisando XML com um esquema StructType

Python
import pyspark.sql.functions as sf
from pyspark.sql.types import StructType, LongType
data = [(1, '''<p><a>1</a></p>''')]
df = spark.createDataFrame(data, ("key", "value"))
schema = StructType().add("a", LongType())
df.select(sf.from_xml(df.value, schema)).show()

Output
+---------------+
|from_xml(value)|
+---------------+
|            {1}|
+---------------+

Exemplo 3 : Analisando XML com ArrayType no esquema

Python
import pyspark.sql.functions as sf
data = [(1, '<p><a>1</a><a>2</a></p>')]
df = spark.createDataFrame(data, ("key", "value"))
# Define the schema with an Array type
schema = "STRUCT<a: ARRAY<BIGINT>>"
# Parse the XML column using the schema with an Array
df.select(sf.from_xml(df.value, schema).alias("xml")).collect()

Output
[Row(xml=Row(a=[1, 2]))]

Exemplo 4 : Analisando XML usando schema_of_xml

Python
import pyspark.sql.functions as sf
# Sample data with an XML column
data = [(1, '<p><a>1</a><a>2</a></p>')]
df = spark.createDataFrame(data, ("key", "value"))
# Generate the schema from an example XML value
schema = sf.schema_of_xml(sf.lit(data[0][1]))
# Parse the XML column using the generated schema
df.select(sf.from_xml(df.value, schema).alias("xml")).collect()

Output
[Row(xml=Row(a=[1, 2]))]

Sintaxe​

Parâmetros​

Devoluções​

Exemplos​

Sintaxe

Parâmetros

Devoluções

Exemplos