Window class

Utility functions for defining window in DataFrames.

Supports Spark Connect

Class attributes

Attribute	Description
`unboundedPreceding`	Boundary value representing the start of an unbounded window frame.
`unboundedFollowing`	Boundary value representing the end of an unbounded window frame.
`currentRow`	Boundary value representing the current row in a window frame.

Methods

Method	Description
`orderBy(*cols)`	Creates a WindowSpec with the ordering defined.
`partitionBy(*cols)`	Creates a WindowSpec with the partitioning defined.
`rangeBetween(start, end)`	Creates a WindowSpec with the frame boundaries defined, from `start` (inclusive) to `end` (inclusive), using range-based offsets from the current row's ORDER BY value.
`rowsBetween(start, end)`	Creates a WindowSpec with the frame boundaries defined, from `start` (inclusive) to `end` (inclusive), using row-based offsets from the current row.

Notes

When ordering is not defined, an unbounded window frame (rowFrame, unboundedPreceding, unboundedFollowing) is used by default. When ordering is defined, a growing window frame (rangeFrame, unboundedPreceding, currentRow) is used by default.

Examples

Basic window with ordering and row frame

Python
from pyspark.sql import Window

# ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
window = Window.orderBy("date").rowsBetween(Window.unboundedPreceding, Window.currentRow)

Partitioned window with range frame

Python
from pyspark.sql import Window

# PARTITION BY country ORDER BY date RANGE BETWEEN 3 PRECEDING AND 3 FOLLOWING
window = Window.orderBy("date").partitionBy("country").rangeBetween(-3, 3)

Row number within partition

Python
from pyspark.sql import Window, functions as sf

df = spark.createDataFrame(
    [(1, "a"), (1, "a"), (2, "a"), (1, "b"), (2, "b"), (3, "b")], ["id", "category"]
)

# Show row number ordered by id within each category partition
window = Window.partitionBy("category").orderBy("id")
df.withColumn("row_number", sf.row_number().over(window)).show()

Running sum with row-based frame

Python
from pyspark.sql import Window, functions as sf

df = spark.createDataFrame(
    [(1, "a"), (1, "a"), (2, "a"), (1, "b"), (2, "b"), (3, "b")], ["id", "category"]
)

# Sum id values from the current row to the next row within each partition
window = Window.partitionBy("category").orderBy("id").rowsBetween(Window.currentRow, 1)
df.withColumn("sum", sf.sum("id").over(window)).sort("id", "category", "sum").show()

Running sum with range-based frame

Python
from pyspark.sql import Window, functions as sf

df = spark.createDataFrame(
    [(1, "a"), (1, "a"), (2, "a"), (1, "b"), (2, "b"), (3, "b")], ["id", "category"]
)

# Sum id values from the current id value to id + 1 within each partition
window = Window.partitionBy("category").orderBy("id").rangeBetween(Window.currentRow, 1)
df.withColumn("sum", sf.sum("id").over(window)).sort("id", "category").show()

Class attributes​

Methods​

Notes​

Examples​

Basic window with ordering and row frame​

Partitioned window with range frame​

Row number within partition​

Running sum with row-based frame​

Running sum with range-based frame​