Skip to main content

partitions (DataSourceReader)

Returns a sequence of partitions for this data source.

Partitions are used to split data reading operations into parallel tasks. If this method returns N partitions, the query planner will create N tasks. Each task will execute read() in parallel, using the respective partition value to read the data.

This method is called once during query planning. By default, it returns a single partition with the value None. Subclasses can override this method to return multiple partitions.

It's recommended to override this method for better performance when reading large datasets.

Syntax

partitions()

Returns

Sequence[InputPartition]

A sequence of partitions for this data source. Each partition value must be an instance of InputPartition or a subclass of it.

Notes

All partition values must be picklable objects.

Examples

Returns a list of integers:

Python
def partitions(self):
return [InputPartition(1), InputPartition(2), InputPartition(3)]

Returns a list of strings:

Python
def partitions(self):
return [InputPartition("a"), InputPartition("b"), InputPartition("c")]

Returns a list of ranges:

Python
class RangeInputPartition(InputPartition):
def __init__(self, start, end):
self.start = start
self.end = end

def partitions(self):
return [RangeInputPartition(1, 3), RangeInputPartition(5, 10)]