Skip to main content

foreachBatch (DataStreamWriter)

Sets the output of the streaming query to be processed using the provided function. Supported only in micro-batch execution mode (that is, when the trigger is not continuous). In every micro-batch, the provided function is called with the output rows as a DataFrame and the batch identifier. The batch ID can be used to deduplicate and transactionally write the output to external systems.

Syntax

foreachBatch(func)

Parameters

Parameter

Type

Description

func

callable

A function that takes a DataFrame and a batch ID (int) as input.

Returns

DataStreamWriter

Notes

In Spark Connect mode, the provided function does not have access to variables defined outside of it.

Examples

Python
import time
df = spark.readStream.format("rate").load()

def func(batch_df, batch_id):
batch_df.collect()

q = df.writeStream.foreachBatch(func).start()
time.sleep(3)
q.stop()