repartitionByRange

Returns a new DataFrame partitioned by the given partitioning expressions. The resulting DataFrame is range partitioned.

Syntax

repartitionByRange(numPartitions: Union[int, "ColumnOrName"], *cols: "ColumnOrName")

Parameters

Parameter	Type	Description
`numPartitions`	int	can be an int to specify the target number of partitions or a Column. If it is a Column, it will be used as the first partitioning column. If not specified, the default number of partitions is used.
`cols`	str or Column	partitioning columns.

Returns

DataFrame: Repartitioned DataFrame.

Notes

At least one partition-by expression must be specified. When no explicit sort order is specified, "ascending nulls first" is assumed.

Due to performance reasons this method uses sampling to estimate the ranges. Hence, the output may not be consistent, since sampling can return different values. The sample size can be controlled by the config spark.sql.execution.rangeExchange.sampleSizePerPartition.

Examples

Python
from pyspark.sql import functions as sf
spark.createDataFrame(
    [(14, "Tom"), (23, "Alice"), (16, "Bob")], ["age", "name"]
).repartitionByRange(2, "age").select(
    "age", "name", sf.spark_partition_id()
).show()
# +---+-----+--------------------+
# |age| name|SPARK_PARTITION_ID()|
# +---+-----+--------------------+
# | 14|  Tom|                   0|
# | 16|  Bob|                   0|
# | 23|Alice|                   1|
# +---+-----+--------------------+

Syntax​

Parameters​

Returns​

Notes​

Examples​

Syntax

Parameters

Returns

Notes

Examples