json (DataFrameReader)
Loads JSON files and returns the results as a DataFrame. JSON Lines (newline-delimited JSON) is supported by default. For JSON with one record per file, set the multiLine option to True.
If schema is not specified, this function reads the input once to determine the input schema.
Syntax
json(path, schema=None, **options)
Parameters
Parameter | Type | Description |
|---|---|---|
| str, list, or RDD | A path to the JSON dataset, a list of paths, or an RDD of strings storing JSON objects. |
| StructType or str, optional | An optional input schema as a |
Returns
DataFrame
Examples
Write a DataFrame into a JSON file and read it back.
Python
import tempfile
with tempfile.TemporaryDirectory(prefix="json") as d:
spark.createDataFrame(
[{"age": 100, "name": "Hyukjin"}]
).write.mode("overwrite").format("json").save(d)
spark.read.json(d).show()
# +---+-------+
# |age| name|
# +---+-------+
# |100|Hyukjin|
# +---+-------+
Read JSON from multiple directories.
Python
from tempfile import TemporaryDirectory
with TemporaryDirectory(prefix="json2") as d1, TemporaryDirectory(prefix="json3") as d2:
spark.createDataFrame(
[{"age": 30, "name": "Bob"}]
).write.mode("overwrite").format("json").save(d1)
spark.createDataFrame(
[{"age": 25, "name": "Alice"}]
).write.mode("overwrite").format("json").save(d2)
spark.read.json([d1, d2]).show()
# +---+-----+
# |age| name|
# +---+-----+
# | 25|Alice|
# | 30| Bob|
# +---+-----+
Read JSON with a custom schema.
Python
import tempfile
with tempfile.TemporaryDirectory(prefix="json") as d:
spark.createDataFrame(
[{"age": 30, "name": "Bob"}]
).write.mode("overwrite").format("json").save(d)
spark.read.json(d, schema="name STRING, age INT").show()
# +----+---+
# |name|age|
# +----+---+
# | Bob| 30|
# +----+---+