Skip to main content

write (DataSourceStreamArrowWriter)

Writes an iterator of PyArrow RecordBatch objects to the streaming sink.

This method is called on executors to write data to the streaming data sink in each microbatch. It accepts an iterator of PyArrow RecordBatch objects and returns a single row representing a commit message, or None if there is no commit message.

The driver collects commit messages, if any, from all executors and passes them to the commit() method if all tasks run successfully. If any task fails, the abort() method will be called with the collected commit messages.

Syntax

write(iterator: Iterator[RecordBatch])

Parameters

Parameter

Type

Description

iterator

Iterator[RecordBatch]

An iterator of PyArrow RecordBatch objects representing the input data.

Returns

WriterCommitMessage

A serializable commit message.

Examples

Python
from dataclasses import dataclass

@dataclass
class MyCommitMessage(WriterCommitMessage):
num_rows: int
batch_id: int

def write(self, iterator: Iterator["RecordBatch"]) -> "WriterCommitMessage":
total_rows = 0
for batch in iterator:
total_rows += len(batch)
return MyCommitMessage(num_rows=total_rows, batch_id=self.current_batch_id)