rowsBetween (Window)
Creates a WindowSpec with the frame boundaries defined, from start (inclusive) to end (inclusive).
Both start and end are relative positions from the current row. For example, 0 means "current row", -1 means the row before the current row, and 5 means the fifth row after the current row.
A row-based boundary is based on the position of the row within the partition. An offset indicates the number of rows above or below the current row where the frame starts or ends.
Syntax
Window.rowsBetween(start, end)
Parameters
Parameter | Type | Description |
|---|---|---|
| int | Boundary start, inclusive. The frame is unbounded if this is |
| int | Boundary end, inclusive. The frame is unbounded if this is |
Returns
WindowSpec
Notes
Use Window.unboundedPreceding, Window.unboundedFollowing, and Window.currentRow to specify special boundary values rather than using integral values directly.
Examples
from pyspark.sql import Window, functions as sf
df = spark.createDataFrame(
[(1, "a"), (1, "a"), (2, "a"), (1, "b"), (2, "b"), (3, "b")], ["id", "category"])
# Calculate the sum of id from the current row to current row + 1 in each category partition.
window = Window.partitionBy("category").orderBy("id").rowsBetween(Window.currentRow, 1)
df.withColumn("sum", sf.sum("id").over(window)).sort("id", "category", "sum").show()
# +---+--------+---+
# | id|category|sum|
# +---+--------+---+
# | 1| a| 2|
# | 1| a| 3|
# | 1| b| 3|
# | 2| a| 2|
# | 2| b| 5|
# | 3| b| 3|
# +---+--------+---+