Skip to main content

DataStreamReader

Interface used to load a streaming DataFrame from external storage systems (for example, file systems and key-value stores). Use spark.readStream to access this.

Syntax

Python
# Access through SparkSession
spark.readStream

Methods

Method

Description

format(source)

Specifies the input data source format.

schema(schema)

Specifies the schema of the streaming DataFrame.

option(key, value)

Adds an input option for the underlying data source.

options(**options)

Adds multiple input options for the underlying data source.

load(path)

Loads the streaming DataFrame from the given path and returns it.

json(path)

Loads a JSON file stream and returns a DataFrame.

orc(path)

Loads an ORC file stream and returns a DataFrame.

parquet(path)

Loads a Parquet file stream and returns a DataFrame.

text(path)

Loads a text file stream and returns a DataFrame.

csv(path)

Loads a CSV file stream and returns a DataFrame.

xml(path)

Loads an XML file stream and returns a DataFrame.

table(tableName)

Loads a streaming Delta table and returns a DataFrame.

name(source_name)

Assigns a name to the streaming source for checkpoint evolution.

changes(tableName)

Returns row-level changes (Change Data Capture) from the specified table as a streaming DataFrame.

Examples

Python
spark.readStream
# <...streaming.readwriter.DataStreamReader object ...>

Load a rate stream, apply a transformation, write to the console, and stop after 3 seconds.

Python
import time
df = spark.readStream.format("rate").load()
df = df.selectExpr("value % 3 as v")
q = df.writeStream.format("console").start()
time.sleep(3)
q.stop()