Skip to main content

DataStreamWriter

Interface used to write a streaming DataFrame to external storage systems (for example, file systems and key-value stores). Use df.writeStream to access this.

Syntax

Python
# Access through DataFrame
df.writeStream

Methods

Method

Description

outputMode(outputMode)

Specifies how data of a streaming DataFrame is written to the sink. Options are append, complete, and update.

format(source)

Specifies the output data source format.

option(key, value)

Adds an output option for the underlying data source.

options(**options)

Adds multiple output options for the underlying data source.

partitionBy(*cols)

Partitions the output by the given columns on the file system.

clusterBy(*cols)

Clusters the output by the given columns.

queryName(queryName)

Specifies the name of the streaming query.

trigger(**kwargs)

Sets the trigger for the streaming query execution.

foreach(f)

Sets the output of the streaming query to be processed by the given function or object.

foreachBatch(func)

Sets the output of each microbatch to be processed by the given function.

start(path)

Starts the execution of the streaming query and returns a StreamingQuery object.

table(tableName)

Alias for toTable(). Writes data to the specified table and returns a StreamingQuery object.

toTable(tableName)

Starts the execution of the streaming query, continually outputting results to the given table.

Examples

Load a rate stream, apply a transformation, write to the console, and stop after 3 seconds.

Python
import time
df = spark.readStream.format("rate").load()
df = df.selectExpr("value % 3 as v")
q = df.writeStream.format("console").start()
time.sleep(3)
q.stop()