Skip to main content

rowsBetween (Window)

Creates a WindowSpec with the frame boundaries defined, from start (inclusive) to end (inclusive).

Both start and end are relative positions from the current row. For example, 0 means "current row", -1 means the row before the current row, and 5 means the fifth row after the current row.

A row-based boundary is based on the position of the row within the partition. An offset indicates the number of rows above or below the current row where the frame starts or ends.

Syntax

Window.rowsBetween(start, end)

Parameters

Parameter

Type

Description

start

int

Boundary start, inclusive. The frame is unbounded if this is Window.unboundedPreceding, or any value less than or equal to -9223372036854775808.

end

int

Boundary end, inclusive. The frame is unbounded if this is Window.unboundedFollowing, or any value greater than or equal to 9223372036854775807.

Returns

WindowSpec

Notes

Use Window.unboundedPreceding, Window.unboundedFollowing, and Window.currentRow to specify special boundary values rather than using integral values directly.

Examples

Python
from pyspark.sql import Window, functions as sf

df = spark.createDataFrame(
[(1, "a"), (1, "a"), (2, "a"), (1, "b"), (2, "b"), (3, "b")], ["id", "category"])

# Calculate the sum of id from the current row to current row + 1 in each category partition.
window = Window.partitionBy("category").orderBy("id").rowsBetween(Window.currentRow, 1)
df.withColumn("sum", sf.sum("id").over(window)).sort("id", "category", "sum").show()
# +---+--------+---+
# | id|category|sum|
# +---+--------+---+
# | 1| a| 2|
# | 1| a| 3|
# | 1| b| 3|
# | 2| a| 2|
# | 2| b| 5|
# | 3| b| 3|
# +---+--------+---+