partitioning.bucket
A transform for any type that partitions by a hash of the input column.
note
This function can be used only in combination with DataFrameWriterV2.partitionedBy method.
Syntax
Python
from pyspark.sql.functions import partitioning
partitioning.bucket(numBuckets, col)
Parameters
Parameter | Type | Description |
|---|---|---|
|
| The number of buckets. |
|
| Target date or timestamp column to work on. |
Examples
Python
from pyspark.sql.functions import partitioning
df.writeTo("catalog.db.table").partitionedBy(
partitioning.bucket(42, "ts")
).createOrReplace()