Skip to main content

json (DataFrameReader)

Loads JSON files and returns the results as a DataFrame. JSON Lines (newline-delimited JSON) is supported by default. For JSON with one record per file, set the multiLine option to True.

If schema is not specified, this function reads the input once to determine the input schema.

Syntax

json(path, schema=None, **options)

Parameters

Parameter

Type

Description

path

str, list, or RDD

A path to the JSON dataset, a list of paths, or an RDD of strings storing JSON objects.

schema

StructType or str, optional

An optional input schema as a StructType object or a DDL-formatted string (for example, 'col0 INT, col1 DOUBLE').

Returns

DataFrame

Examples

Write a DataFrame into a JSON file and read it back.

Python
import tempfile
with tempfile.TemporaryDirectory(prefix="json") as d:
spark.createDataFrame(
[{"age": 100, "name": "Hyukjin"}]
).write.mode("overwrite").format("json").save(d)

spark.read.json(d).show()
# +---+-------+
# |age| name|
# +---+-------+
# |100|Hyukjin|
# +---+-------+

Read JSON from multiple directories.

Python
from tempfile import TemporaryDirectory
with TemporaryDirectory(prefix="json2") as d1, TemporaryDirectory(prefix="json3") as d2:
spark.createDataFrame(
[{"age": 30, "name": "Bob"}]
).write.mode("overwrite").format("json").save(d1)
spark.createDataFrame(
[{"age": 25, "name": "Alice"}]
).write.mode("overwrite").format("json").save(d2)

spark.read.json([d1, d2]).show()
# +---+-----+
# |age| name|
# +---+-----+
# | 25|Alice|
# | 30| Bob|
# +---+-----+

Read JSON with a custom schema.

Python
import tempfile
with tempfile.TemporaryDirectory(prefix="json") as d:
spark.createDataFrame(
[{"age": 30, "name": "Bob"}]
).write.mode("overwrite").format("json").save(d)
spark.read.json(d, schema="name STRING, age INT").show()
# +----+---+
# |name|age|
# +----+---+
# | Bob| 30|
# +----+---+