to_avro

Converts a column into binary of Avro format.

If both subject and schemaRegistryAddress are provided, the function converts a column into binary of Schema Registry Avro format. The input data schema must have been registered to the given subject in Schema Registry, or the query fails at runtime.

Syntax

Python
from pyspark.sql.avro.functions import to_avro

to_avro(data, jsonFormatSchema=None, subject=None, schemaRegistryAddress=None, options=None)

Parameters

Parameter	Type	Description
`data`	`pyspark.sql.Column` or str	The data column to serialize.
`jsonFormatSchema`	str, optional	User-specified output Avro schema in JSON string format.
`subject`	`pyspark.sql.Column` or str, optional	The subject in Schema Registry that the data belongs to.
`schemaRegistryAddress`	str, optional	The address (host and port) of the Schema Registry.
`options`	dict, optional	Options to control how the Avro record is serialized and configuration for the schema registry client.

Returns

pyspark.sql.Column: A new column containing the Avro-encoded binary data.

Examples

Example 1: Converting a string column to Avro binary format

Python
from pyspark.sql.avro.functions import to_avro

data = ['SPADES']
df = spark.createDataFrame(data, "string")
df.select(to_avro(df.value).alias("avro")).show(truncate=False)

Output
+--------------------+
|avro                |
+--------------------+
|[00 0C 53 50 41 4...|
+--------------------+

Example 2: Converting a string column to Avro using a custom JSON schema

Python
from pyspark.sql.avro.functions import to_avro

data = ['SPADES']
df = spark.createDataFrame(data, "string")
json_format_schema = '''["null", {"type": "enum", "name": "value",
    "symbols": ["SPADES", "HEARTS", "DIAMONDS", "CLUBS"]}]'''
df.select(to_avro(df.value, json_format_schema).alias("avro")).show(truncate=False)

Output
+--------+
|avro    |
+--------+
|[02 00] |
+--------+

Syntax​

Parameters​

Returns​

Examples​

Syntax

Parameters

Returns

Examples