Skip to main content

json (DataStreamReader)

Loads a JSON file stream and returns the results as a DataFrame. JSON Lines (newline-delimited JSON) is supported by default. For JSON with one record per file, set the multiLine option to true. If schema is not specified, the input schema is inferred from the data.

Syntax

json(path, schema=None, **options)

Parameters

Parameter

Type

Description

path

str

Path to the JSON dataset.

schema

StructType or str, optional

Schema as a StructType or DDL-formatted string (for example, col0 INT, col1 DOUBLE).

Returns

DataFrame

Examples

Load a stream from a temporary JSON file:

Python
import tempfile
import time
with tempfile.TemporaryDirectory(prefix="json") as d:
spark.createDataFrame(
[(100, "Hyukjin Kwon"),], ["age", "name"]
).write.mode("overwrite").format("json").save(d)
q = spark.readStream.schema(
"age INT, name STRING"
).json(d).writeStream.format("console").start()
time.sleep(3)
q.stop()