Skip to main content

width_bucket

Returns the bucket number into which the value of this expression would fall after being evaluated. Note that input arguments must follow conditions listed below; otherwise, the method will return null. Supports Spark Connect.

For the corresponding Databricks SQL function, see width_bucket function.

Syntax

Python
from pyspark.databricks.sql import functions as dbf

dbf.width_bucket(v=<v>, min=<min>, max=<max>, numBucket=<numBucket>)

Parameters

Parameter

Type

Description

v

pyspark.sql.Column or column name

value to compute a bucket number in the histogram

min

pyspark.sql.Column or column name

minimum value of the histogram

max

pyspark.sql.Column or column name

maximum value of the histogram

numBucket

pyspark.sql.Column, column name or int

the number of buckets

Returns

pyspark.sql.Column: the bucket number into which the value would fall after being evaluated

Examples

Python
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([
(5.3, 0.2, 10.6, 5),
(-2.1, 1.3, 3.4, 3),
(8.1, 0.0, 5.7, 4),
(-0.9, 5.2, 0.5, 2)],
['v', 'min', 'max', 'n'])
df.select("*", dbf.width_bucket('v', 'min', 'max', 'n')).show()
Output
+----+---+----+---+----------------------------+
| v|min| max| n|width_bucket(v, min, max, n)|
+----+---+----+---+----------------------------+
| 5.3|0.2|10.6| 5| 3|
|-2.1|1.3| 3.4| 3| 0|
| 8.1|0.0| 5.7| 4| 5|
|-0.9|5.2| 0.5| 2| 3|
+----+---+----+---+----------------------------+