foreachBatch (DataStreamWriter)

Sets the output of the streaming query to be processed using the provided function. Supported only in micro-batch execution mode (that is, when the trigger is not continuous). In every micro-batch, the provided function is called with the output rows as a DataFrame and the batch identifier. The batch ID can be used to deduplicate and transactionally write the output to external systems.

Syntax

foreachBatch(func)

Parameters

Parameter	Type	Description
`func`	callable	A function that takes a DataFrame and a batch ID (int) as input.

Returns

DataStreamWriter

Notes

In Spark Connect mode, the provided function does not have access to variables defined outside of it.

Examples

Python
import time
df = spark.readStream.format("rate").load()

def func(batch_df, batch_id):
    batch_df.collect()

q = df.writeStream.foreachBatch(func).start()
time.sleep(3)
q.stop()

Syntax​

Parameters​

Returns​

Notes​

Examples​

Syntax

Parameters

Returns

Notes

Examples