DataStreamReader
Interface used to load a streaming DataFrame from external storage systems (for example, file systems and key-value stores). Use spark.readStream to access this.
Syntax
# Access through SparkSession
spark.readStream
Methods
Method | Description |
|---|---|
Specifies the input data source format. | |
Specifies the schema of the streaming DataFrame. | |
Adds an input option for the underlying data source. | |
Adds multiple input options for the underlying data source. | |
Loads the streaming DataFrame from the given path and returns it. | |
Loads a JSON file stream and returns a DataFrame. | |
Loads an ORC file stream and returns a DataFrame. | |
Loads a Parquet file stream and returns a DataFrame. | |
Loads a text file stream and returns a DataFrame. | |
Loads a CSV file stream and returns a DataFrame. | |
Loads an XML file stream and returns a DataFrame. | |
Loads a streaming Delta table and returns a DataFrame. | |
Assigns a name to the streaming source for checkpoint evolution. | |
Returns row-level changes (Change Data Capture) from the specified table as a streaming DataFrame. |
Examples
spark.readStream
# <...streaming.readwriter.DataStreamReader object ...>
Load a rate stream, apply a transformation, write to the console, and stop after 3 seconds.
import time
df = spark.readStream.format("rate").load()
df = df.selectExpr("value % 3 as v")
q = df.writeStream.format("console").start()
time.sleep(3)
q.stop()