from_avro
Converts a binary column of Avro format into its corresponding catalyst value. The specified schema must match the read data, otherwise the behavior is undefined: it may fail or return an arbitrary result.
If jsonFormatSchema is not provided but both subject and schemaRegistryAddress are provided, the function converts a binary column of Schema Registry Avro format into its corresponding catalyst value.
Syntax
from pyspark.sql.avro.functions import from_avro
from_avro(data, jsonFormatSchema=None, options=None, subject=None, schemaRegistryAddress=None)
Parameters
Parameter | Type | Description |
|---|---|---|
|
| The binary column containing Avro-encoded data. |
| str, optional | The Avro schema in JSON string format. |
| dict, optional | Options to control how the Avro record is parsed and configuration for the schema registry client. |
| str, optional | The subject in Schema Registry that the data belongs to. |
| str, optional | The address (host and port) of the Schema Registry. |
Options
Option | Values | Description |
|---|---|---|
|
| Error handling mode. Default: |
|
| Compression codec for encoding Avro data. |
|
| Schema evolution mode. Default: |
| Range: | Maximum recursion depth along a single recursive path. Default: When a shared type is reachable from many distinct schema paths, schema expansion might cause the driver to run out of memory because this option bounds depth on one path only. To workaround:
|
Returns
pyspark.sql.Column: A new column containing the deserialized Avro data as the corresponding catalyst value.
Examples
Example 1: Deserializing an Avro binary column using a JSON schema
from pyspark.sql import Row
from pyspark.sql.avro.functions import from_avro, to_avro
data = [(1, Row(age=2, name='Alice'))]
df = spark.createDataFrame(data, ("key", "value"))
avro_df = df.select(to_avro(df.value).alias("avro"))
json_format_schema = '''{"type":"record","name":"topLevelRecord","fields":
[{"name":"avro","type":[{"type":"record","name":"value",
"namespace":"topLevelRecord","fields":[{"name":"age","type":["long","null"]},
{"name":"name","type":["string","null"]}]},"null"]}]}'''
avro_df.select(from_avro(avro_df.avro, json_format_schema).alias("value")).show(truncate=False)
+------------------+
|value |
+------------------+
|{{2, Alice}} |
+------------------+