Skip to main content

Window frame clause

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime

Specifies a sliding subset of rows within the partition on which the aggregate or analytic window function operates.

Syntax

{ frame_mode frame_start |
frame_mode BETWEEN frame_start AND frame_end } }

frame_mode
{ RANGE | ROWS }

frame_start
{ UNBOUNDED PRECEDING |
offset_start PRECEDING |
CURRENT ROW |
offset_start FOLLOWING }

frame_end
{ offset_stop PRECEDING |
CURRENT ROW |
offset_stop FOLLOWING |
UNBOUNDED FOLLOWING }

Parameters

  • frame_mode

    • ROWS

      If specified, the sliding window frame is expressed in terms of rows preceding or following the current row.

    • RANGE

      If specified, the window function must specify an ORDER BY clause with a single expression obExpr.

      The boundaries of the sliding window are then expressed as an offset from the obExpr for the current row.

  • frame_start

    The starting position of the sliding window frame relative to the current row.

    • UNBOUNDED PRECEDING

      Specifies that the window frame starts at the beginning of partition.

    • offset_start PRECEDING

      If the mode is ROWS, offset_start is the positive integral literal number defining how many rows prior to the current row the frame starts.

      If the mode is RANGE, offset_start is a positive literal value of a type which can be subtracted from obExpr. The frame starts at the first row of the partition for which obExpr is greater or equal to obExpr - offset_start at the current row.

    • CURRENT ROW

      Specifies that the frame starts at the current row.

    • offset_start FOLLOWING

      If the mode is ROWS, offset_start is the positive integral literal number defining how many rows past to the current row the frame starts. If the mode is RANGE, offset_start is a positive literal value of a type which can be added to obExpr. The frame starts at the first row of the partition for which obExpr is greater or equal to obExpr + offset_start at the current row.

  • frame_stop

    The end of the sliding window frame relative to the current row.

    If not specified, the frame stops at the CURRENT ROW. The end of the sliding window must be greater than the start of the window frame.

    • offset_stop PRECEDING

      If frame_mode is ROWS, offset_stop is the positive integral literal number defining how many rows prior to the current row the frame stops. If frame_mode is RANGE, offset_stop is a positive literal value of the same type as offset_start. The frame ends at the last row off the partition for which obExpr is less than or equal to obExpr - offset_stop at the current row.

    • CURRENT ROW

      Specifies that the frame stops at the current row.

    • offset_stop FOLLOWING

      If frame_mode is ROWS, offset_stop is the positive integral literal number defining how many rows past to the current row the frame ends. If frame_mode is RANGE, offset_stop is a positive literal value of the same type as offset_start. The frame ends at the last row of the partition for which obExpr is less than or equal to obExpr + offset_stop at the current row.

    • UNBOUNDED FOLLOWING

      Specifies that the window frame stops at the end of the partition.

Examples

SQL
> CREATE TABLE employees
(name STRING, dept STRING, salary INT, age INT);
> INSERT INTO employees
VALUES ('Lisa', 'Sales', 10000, 35),
('Evan', 'Sales', 32000, 38),
('Fred', 'Engineering', 21000, 28),
('Alex', 'Sales', 30000, 33),
('Tom', 'Engineering', 23000, 33),
('Jane', 'Marketing', 29000, 28),
('Jeff', 'Marketing', 35000, 38),
('Paul', 'Engineering', 29000, 23),
('Chloe', 'Engineering', 23000, 25);

-- ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: cumulative sum of salary within each department.
> SELECT name,
dept,
salary,
SUM(salary) OVER (PARTITION BY dept ORDER BY salary
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM employees;
Fred Engineering 21000 21000
Chloe Engineering 23000 44000
Tom Engineering 23000 67000
Paul Engineering 29000 96000
Jane Marketing 29000 29000
Jeff Marketing 35000 64000
Lisa Sales 10000 10000
Alex Sales 30000 40000
Evan Sales 32000 72000

-- ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING: moving average over three adjacent rows.
> SELECT name,
dept,
salary,
ROUND(AVG(salary) OVER (PARTITION BY dept ORDER BY salary
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)) AS moving_avg
FROM employees;
Fred Engineering 21000 22000
Chloe Engineering 23000 22333
Tom Engineering 23000 25000
Paul Engineering 29000 26000
Jane Marketing 29000 32000
Jeff Marketing 35000 32000
Lisa Sales 10000 20000
Alex Sales 30000 24000
Evan Sales 32000 31000

-- ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING: sum from the current row to the end of the partition.
> SELECT name,
dept,
salary,
SUM(salary) OVER (PARTITION BY dept ORDER BY salary
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS remaining_total
FROM employees;
Fred Engineering 21000 96000
Chloe Engineering 23000 75000
Tom Engineering 23000 52000
Paul Engineering 29000 29000
Jane Marketing 29000 64000
Jeff Marketing 35000 35000
Lisa Sales 10000 72000
Alex Sales 30000 62000
Evan Sales 32000 32000

-- RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: cumulative sum using value-based range.
-- Unlike ROWS mode, RANGE groups rows with equal `ORDER BY` values together.
> SELECT name,
dept,
salary,
SUM(salary) OVER (PARTITION BY dept ORDER BY salary
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS range_total
FROM employees;
Fred Engineering 21000 21000
Chloe Engineering 23000 67000
Tom Engineering 23000 67000
Paul Engineering 29000 96000
Jane Marketing 29000 29000
Jeff Marketing 35000 64000
Lisa Sales 10000 10000
Alex Sales 30000 40000
Evan Sales 32000 72000

-- RANGE BETWEEN 5000 PRECEDING AND 5000 FOLLOWING: sum of salaries within +/- 5000 of the current row's salary.
> SELECT name,
dept,
salary,
SUM(salary) OVER (PARTITION BY dept ORDER BY salary
RANGE BETWEEN 5000 PRECEDING AND 5000 FOLLOWING) AS nearby_total
FROM employees;
Fred Engineering 21000 67000
Chloe Engineering 23000 67000
Tom Engineering 23000 67000
Paul Engineering 29000 75000
Jane Marketing 29000 64000
Jeff Marketing 35000 35000
Lisa Sales 10000 10000
Alex Sales 30000 92000
Evan Sales 32000 62000

-- Comparing ROWS vs RANGE: the difference is visible when there are duplicate `ORDER BY` values.
-- With ROWS, `Chloe` and `Tom` have different running totals because each row is counted individually.
-- With RANGE, `Chloe` and `Tom` have the same total because they share the same salary value.
> SELECT name,
salary,
SUM(salary) OVER (ORDER BY salary
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS rows_total,
SUM(salary) OVER (ORDER BY salary
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS range_total
FROM employees
WHERE dept = 'Engineering';
Fred 21000 21000 21000
Chloe 23000 44000 67000
Tom 23000 67000 67000
Paul 29000 96000 96000