Skip to main content

partitioning.bucket

A transform for any type that partitions by a hash of the input column.

note

This function can be used only in combination with DataFrameWriterV2.partitionedBy method.

Syntax

Python
from pyspark.sql.functions import partitioning

partitioning.bucket(numBuckets, col)

Parameters

Parameter

Type

Description

numBuckets

pyspark.sql.Column or int

The number of buckets.

col

pyspark.sql.Column or str

Target date or timestamp column to work on.

Examples

Python
from pyspark.sql.functions import partitioning
df.writeTo("catalog.db.table").partitionedBy(
partitioning.bucket(42, "ts")
).createOrReplace()