Skip to main content

to_json

Converts a column containing a StructType, ArrayType, MapType or a VariantType into a JSON string. Throws an exception, in the case of an unsupported type.

Syntax

Python
from pyspark.sql import functions as sf

sf.to_json(col, options=None)

Parameters

Parameter

Type

Description

col

pyspark.sql.Column or str

Name of column containing a struct, an array, a map, or a variant object.

options

dict, optional

Options to control converting. Accepts the same options as the JSON datasource. Additionally the function supports the pretty option which enables pretty JSON generation.

Returns

pyspark.sql.Column: JSON object as string column.

Examples

Example 1: Converting a StructType column to JSON

Python
import pyspark.sql.functions as sf
from pyspark.sql import Row
data = [(1, Row(age=2, name='Alice'))]
df = spark.createDataFrame(data, ("key", "value"))
df.select(sf.to_json(df.value).alias("json")).show(truncate=False)
Output
+------------------------+
|json |
+------------------------+
|{"age":2,"name":"Alice"}|
+------------------------+

Example 2: Converting an ArrayType column to JSON

Python
import pyspark.sql.functions as sf
from pyspark.sql import Row
data = [(1, [Row(age=2, name='Alice'), Row(age=3, name='Bob')])]
df = spark.createDataFrame(data, ("key", "value"))
df.select(sf.to_json(df.value).alias("json")).show(truncate=False)
Output
+-------------------------------------------------+
|json |
+-------------------------------------------------+
|[{"age":2,"name":"Alice"},{"age":3,"name":"Bob"}]|
+-------------------------------------------------+

Example 3: Converting a MapType column to JSON

Python
import pyspark.sql.functions as sf
df = spark.createDataFrame([(1, {"name": "Alice"})], ("key", "value"))
df.select(sf.to_json(df.value).alias("json")).show(truncate=False)
Output
+----------------+
|json |
+----------------+
|{"name":"Alice"}|
+----------------+

Example 4: Converting a VariantType column to JSON

Python
import pyspark.sql.functions as sf
df = spark.createDataFrame([(1, '{"name": "Alice"}')], ("key", "value"))
df.select(sf.to_json(sf.parse_json(df.value)).alias("json")).show(truncate=False)
Output
+----------------+
|json |
+----------------+
|{"name":"Alice"}|
+----------------+

Example 5: Converting a nested MapType column to JSON

Python
import pyspark.sql.functions as sf
df = spark.createDataFrame([(1, [{"name": "Alice"}, {"name": "Bob"}])], ("key", "value"))
df.select(sf.to_json(df.value).alias("json")).show(truncate=False)
Output
+---------------------------------+
|json |
+---------------------------------+
|[{"name":"Alice"},{"name":"Bob"}]|
+---------------------------------+

Example 6: Converting a simple ArrayType column to JSON

Python
import pyspark.sql.functions as sf
df = spark.createDataFrame([(1, ["Alice", "Bob"])], ("key", "value"))
df.select(sf.to_json(df.value).alias("json")).show(truncate=False)
Output
+---------------+
|json |
+---------------+
|["Alice","Bob"]|
+---------------+

Example 7: Converting to JSON with specified options

Python
import pyspark.sql.functions as sf
df = spark.sql("SELECT (DATE('2022-02-22'), 1) AS date")
json1 = sf.to_json(df.date)
json2 = sf.to_json(df.date, {"dateFormat": "yyyy/MM/dd"})
df.select("date", json1, json2).show(truncate=False)
Output
+---------------+------------------------------+------------------------------+
|date |to_json(date) |to_json(date) |
+---------------+------------------------------+------------------------------+
|{2022-02-22, 1}|{"col1":"2022-02-22","col2":1}|{"col1":"2022/02/22","col2":1}|
+---------------+------------------------------+------------------------------+