Skip to main content

rangeBetween (Window)

Creates a WindowSpec with the frame boundaries defined, from start (inclusive) to end (inclusive).

Both start and end are relative from the current row. For example, 0 means "current row", -1 means one before the current row, and 5 means five after the current row.

A range-based boundary is based on the actual value of the ORDER BY expression(s). An offset alters the value of the ORDER BY expression — for example, if the current ORDER BY value is 10 and the lower bound offset is -3, the resulting lower bound is 7. Because of this, range-based frames require exactly one ORDER BY expression with a numerical data type, unless the offset is unbounded.

Syntax

Window.rangeBetween(start, end)

Parameters

Parameter

Type

Description

start

int

Boundary start, inclusive. The frame is unbounded if this is Window.unboundedPreceding, or any value less than or equal to max(-sys.maxsize, -9223372036854775808).

end

int

Boundary end, inclusive. The frame is unbounded if this is Window.unboundedFollowing, or any value greater than or equal to min(sys.maxsize, 9223372036854775807).

Returns

WindowSpec

Notes

Use Window.unboundedPreceding, Window.unboundedFollowing, and Window.currentRow to specify special boundary values rather than using integral values directly.

Examples

Python
from pyspark.sql import Window, functions as sf

df = spark.createDataFrame(
[(1, "a"), (1, "a"), (2, "a"), (1, "b"), (2, "b"), (3, "b")], ["id", "category"])

# Calculate the sum of id where the id value falls within [current id, current id + 1]
# in each category partition.
window = Window.partitionBy("category").orderBy("id").rangeBetween(Window.currentRow, 1)
df.withColumn("sum", sf.sum("id").over(window)).sort("id", "category").show()
# +---+--------+---+
# | id|category|sum|
# +---+--------+---+
# | 1| a| 4|
# | 1| a| 4|
# | 1| b| 3|
# | 2| a| 2|
# | 2| b| 5|
# | 3| b| 3|
# +---+--------+---+