foreachBatch (DataStreamWriter)
Sets the output of the streaming query to be processed using the provided function. Supported only in micro-batch execution mode (that is, when the trigger is not continuous). In every micro-batch, the provided function is called with the output rows as a DataFrame and the batch identifier. The batch ID can be used to deduplicate and transactionally write the output to external systems.
Syntax
foreachBatch(func)
Parameters
Parameter | Type | Description |
|---|---|---|
| callable | A function that takes a DataFrame and a batch ID (int) as input. |
Returns
DataStreamWriter
Notes
In Spark Connect mode, the provided function does not have access to variables defined outside of it.
Examples
Python
import time
df = spark.readStream.format("rate").load()
def func(batch_df, batch_id):
batch_df.collect()
q = df.writeStream.foreachBatch(func).start()
time.sleep(3)
q.stop()