json (DataFrameReader)

Loads JSON files and returns the results as a DataFrame. JSON Lines (newline-delimited JSON) is supported by default. For JSON with one record per file, set the multiLine option to True.

If schema is not specified, this function reads the input once to determine the input schema.

Syntax

json(path, schema=None, **options)

Parameters

Parameter	Type	Description
`path`	str, list, or RDD	A path to the JSON dataset, a list of paths, or an RDD of strings storing JSON objects.
`schema`	StructType or str, optional	An optional input schema as a `StructType` object or a DDL-formatted string (for example, `'col0 INT, col1 DOUBLE'`).

Parameter	Type	Description
`path`	str, list, or RDD	A path to the JSON dataset, a list of paths, or an RDD of strings storing JSON objects.
`schema`	StructType or str, optional	An optional input schema as a `StructType` object or a DDL-formatted string (for example, `'col0 INT, col1 DOUBLE'`).

Returns

DataFrame

Examples

Write a DataFrame into a JSON file and read it back.

Python
import tempfile
with tempfile.TemporaryDirectory(prefix="json") as d:
    spark.createDataFrame(
        [{"age": 100, "name": "Hyukjin"}]
    ).write.mode("overwrite").format("json").save(d)

    spark.read.json(d).show()
    # +---+-------+
    # |age|   name|
    # +---+-------+
    # |100|Hyukjin|
    # +---+-------+

Read JSON from multiple directories.

Python
from tempfile import TemporaryDirectory
with TemporaryDirectory(prefix="json2") as d1, TemporaryDirectory(prefix="json3") as d2:
    spark.createDataFrame(
        [{"age": 30, "name": "Bob"}]
    ).write.mode("overwrite").format("json").save(d1)
    spark.createDataFrame(
        [{"age": 25, "name": "Alice"}]
    ).write.mode("overwrite").format("json").save(d2)

    spark.read.json([d1, d2]).show()
    # +---+-----+
    # |age| name|
    # +---+-----+
    # | 25|Alice|
    # | 30|  Bob|
    # +---+-----+

Read JSON with a custom schema.

Python
import tempfile
with tempfile.TemporaryDirectory(prefix="json") as d:
    spark.createDataFrame(
       [{"age": 30, "name": "Bob"}]
    ).write.mode("overwrite").format("json").save(d)
    spark.read.json(d, schema="name STRING, age INT").show()
    # +----+---+
    # |name|age|
    # +----+---+
    # | Bob| 30|
    # +----+---+

Syntax​

Parameters​

Returns​

Examples​

Syntax

Parameters

Returns

Examples