Skip to main content

PIVOT clause

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime

Transforms the rows of the preceding table_reference by rotating unique values of a specified column list into separate columns.

Syntax

PIVOT ( { aggregate_expression [ [ AS ] agg_column_alias ] } [, ...]
FOR column_list IN ( expression_list ) )

column_list
{ column_name |
( column_name [, ...] ) }

expression_list
{ expression [ AS ] [ column_alias ] |
{ ( expression [, ...] ) [ AS ] [ column_alias] } [, ...] ) }

Parameters

  • aggregate_expression

    An expression of any type where all column references table_reference are arguments to aggregate functions.

  • agg_column_alias

    An optional alias for the result of the aggregation. If no alias is specified, PIVOT generates an alias based on aggregate_expression.

  • column_list

    The set of columns to be rotated.

  • expression_list

    Maps values from column_list to column aliases.

    • expression

      A literal expression with a type that shares a least common type with the respective column_name. If the expression is not a literal, Databricks raises NON_LITERAL_PIVOT_VALUES. If the type does not match, Databricks raises PIVOT_VALUE_DATA_TYPE_MISMATCH.

      The number of expressions in each tuple must match the number of column_names in column_list.

    • column_alias

      An optional alias specifying the name of the generated column. If no alias is specified PIVOT generates an alias based on the expressions.

Result

A temporary table of the following form:

  • All the columns from the intermediate result set of the table_reference that have not been specified in any aggregate_expression or column_list.

    These columns are grouping columns.

  • For each expression tuple and aggregate_expression combination, PIVOT generates one column. The type is the type of aggregate_expression.

    If there is only one aggregate_expression the column is named using column_alias. Otherwise, it is named column_alias_agg_column_alias.

    The value in each cell is the result of the aggregation_expression using a FILTER ( WHERE column_list IN (expression, ...).

Common error conditions

Examples

SQL
-- A very basic PIVOT
-- Given a table with sales by quarter, return a table that returns sales across quarters per year.
> CREATE TEMP VIEW sales(year, quarter, region, sales) AS
VALUES (2018, 1, 'east', 100),
(2018, 2, 'east', 20),
(2018, 3, 'east', 40),
(2018, 4, 'east', 40),
(2019, 1, 'east', 120),
(2019, 2, 'east', 110),
(2019, 3, 'east', 80),
(2019, 4, 'east', 60),
(2018, 1, 'west', 105),
(2018, 2, 'west', 25),
(2018, 3, 'west', 45),
(2018, 4, 'west', 45),
(2019, 1, 'west', 125),
(2019, 2, 'west', 115),
(2019, 3, 'west', 85),
(2019, 4, 'west', 65);

> SELECT year, region, q1, q2, q3, q4
FROM sales
PIVOT (sum(sales) AS sales
FOR quarter
IN (1 AS q1, 2 AS q2, 3 AS q3, 4 AS q4));
year region q1 q2 q3 q4
2018 east 100 20 40 40
2019 east 120 110 80 60
2018 west 105 25 45 45
2019 west 125 115 85 65

-- The same query written without PIVOT
> SELECT year, region,
sum(sales) FILTER(WHERE quarter = 1) AS q1,
sum(sales) FILTER(WHERE quarter = 2) AS q2,
sum(sales) FILTER(WHERE quarter = 3) AS q2,
sum(sales) FILTER(WHERE quarter = 4) AS q4
FROM sales
GROUP BY year, region;
year region q1 q2 q3 q4
2018 east 100 20 40 40
2019 east 120 110 80 60
2018 west 105 25 45 45
2019 west 125 115 85 65

-- Also PIVOT on region
> SELECT year, q1_east, q1_west, q2_east, q2_west, q3_east, q3_west, q4_east, q4_west
FROM sales
PIVOT (sum(sales) AS sales
FOR (quarter, region)
IN ((1, 'east') AS q1_east, (1, 'west') AS q1_west, (2, 'east') AS q2_east, (2, 'west') AS q2_west,
(3, 'east') AS q3_east, (3, 'west') AS q3_west, (4, 'east') AS q4_east, (4, 'west') AS q4_west));
year q1_east q1_west q2_east q2_west q3_east q3_west q4_east q4_west
2018 100 105 20 25 40 45 40 45
2019 120 125 110 115 80 85 60 65

-- The same query written without PIVOT
> SELECT year,
sum(sales) FILTER(WHERE (quarter, region) IN ((1, 'east'))) AS q1_east,
sum(sales) FILTER(WHERE (quarter, region) IN ((1, 'west'))) AS q1_west,
sum(sales) FILTER(WHERE (quarter, region) IN ((2, 'east'))) AS q2_east,
sum(sales) FILTER(WHERE (quarter, region) IN ((2, 'west'))) AS q2_west,
sum(sales) FILTER(WHERE (quarter, region) IN ((3, 'east'))) AS q3_east,
sum(sales) FILTER(WHERE (quarter, region) IN ((3, 'west'))) AS q3_west,
sum(sales) FILTER(WHERE (quarter, region) IN ((4, 'east'))) AS q4_east,
sum(sales) FILTER(WHERE (quarter, region) IN ((4, 'west'))) AS q4_west
FROM sales
GROUP BY year;
year q1_east q1_west q2_east q2_west q3_east q3_west q4_east q4_west
2018 100 105 20 25 40 45 40 45
2019 120 125 110 115 80 85 60 65

-- To aggregate across regions the column must be removed from the input.
> SELECT year, q1, q2, q3, q4
FROM (SELECT year, quarter, sales FROM sales) AS s
PIVOT (sum(sales) AS sales
FOR quarter
IN (1 AS q1, 2 AS q2, 3 AS q3, 4 AS q4));
year q1 q2 q3 q4
2018 205 45 85 85
2019 245 225 165 125

-- The same query without PIVOT
> SELECT year,
sum(sales) FILTER(WHERE quarter = 1) AS q1,
sum(sales) FILTER(WHERE quarter = 2) AS q2,
sum(sales) FILTER(WHERE quarter = 3) AS q3,
sum(sales) FILTER(WHERE quarter = 4) AS q4
FROM sales
GROUP BY year;
year q1 q2 q3 q4
2018 205 45 85 85
2019 245 225 165 125

-- A PIVOT with multiple aggregations
> SELECT year, q1_total, q1_avg, q2_total, q2_avg, q3_total, q3_avg, q4_total, q4_avg
FROM (SELECT year, quarter, sales FROM sales) AS s
PIVOT (sum(sales) AS total, avg(sales) AS avg
FOR quarter
IN (1 AS q1, 2 AS q2, 3 AS q3, 4 AS q4));
year q1_total q1_avg q2_total q2_avg q3_total q3_avg q4_total q4_avg
2018 205 102.5 45 22.5 85 42.5 85 42.5
2019 245 122.5 225 112.5 165 82.5 125 62.5

-- The same query without PIVOT
> SELECT year,
sum(sales) FILTER(WHERE quarter = 1) AS q1_total,
avg(sales) FILTER(WHERE quarter = 1) AS q1_avg,
sum(sales) FILTER(WHERE quarter = 2) AS q2_total,
avg(sales) FILTER(WHERE quarter = 2) AS q2_avg,
sum(sales) FILTER(WHERE quarter = 3) AS q3_total,
avg(sales) FILTER(WHERE quarter = 3) AS q3_avg,
sum(sales) FILTER(WHERE quarter = 4) AS q4_total,
avg(sales) FILTER(WHERE quarter = 4) AS q4_avg
FROM sales
GROUP BY year;
year q1_total q1_avg q2_total q2_avg q3_total q3_avg q4_total q4_avg
2018 205 102.5 45 22.5 85 42.5 85 42.5
2019 245 122.5 225 112.5 165 82.5 125 62.5

-- Pivot values must be literals.
> SELECT * FROM sales
PIVOT (sum(sales) FOR quarter IN (1 + 0 AS q1));
Error: NON_LITERAL_PIVOT_VALUES