Common table expression (CTE)

Applies to: Databricks SQL Databricks Runtime

Defines a temporary result set that you can reference possibly multiple times within the scope of a SQL statement. A CTE is used mainly in a SELECT statement.

Syntax

WITH [ RECURSIVE ] common_table_expression [, ...]

common_table_expression
  view_identifier [ ( column_identifier [, ...] ) ] [ recursion_limit ] [ AS ] ( query | recursive_query )

recursion_limit
  MAX RECURSION LEVEL maxLevel

recursive_query
  base_case_query UNION ALL step_case_query

Parameters

RECURSIVE

Preview
This feature is in Public Preview.

Applies to: Databricks Runtime 17.0 and later

When specified, the common table expressions can contain a recursive_query.
view_identifier

An identifier by which the common_table_expression can be referenced
column_identifier

An optional identifier by which a column of the common_table_expression can be referenced.

If column_identifiers are specified their number must match the number of columns returned by the query. If no names are specified the column names are derived from the query.
MAX RECURSION LEVEL maxLevel

Applies to: Databricks Runtime 17.0 and later

Specifies the maximum number of recursive steps that can be performed by a recursive CTE. maxLevel must be an INTEGER literal > 0. The default value is 100.

If the maxLimit is exceeded, RECURSION_LEVEL_LIMIT_EXCEEDED is raised. If the CTE is not recursive this clause is ignored.
query

A query producing a result set.
recursive_query

Applies to: Databricks Runtime 17.0 and later

When RECURSIVE is specified, a query of a CTE can refer to itself. This makes a CTE a recursive CTE.
- base_case_subquery
  
  This subquery provides the seed or anchor for the recursion. It must not reference view_identifier.
- step_subquery
  
  This subquery produces a new set of rows based on the previous step. To be recursive it must reference view_identifier.

Limitations

Recursive CTEs are not supported for UPDATE, DELETE, and MERGE statements.
step_subquery must not include correlated column references to view_identifier.
Invoking random number generators from the recursive part can generate the same random number in each iteration.
The total size of the result set is limited to 1,000,000 rows. This limit can be overridden if the SELECT statement driving the recursive CTE includes a LIMIT clause, which effectively controls the recursion.

Examples

SQL
-- CTE with multiple column aliases
> WITH t(x, y) AS (SELECT 1, 2)
  SELECT * FROM t WHERE x = 1 AND y = 2;
   1   2

-- CTE in CTE definition
> WITH t AS (
    WITH t2 AS (SELECT 1)
    SELECT * FROM t2)
  SELECT * FROM t;
   1

-- CTE in subquery
> SELECT max(c) FROM (
    WITH t(c) AS (SELECT 1)
    SELECT * FROM t);
      1

-- CTE in subquery expression
> SELECT (WITH t AS (SELECT 1)
          SELECT * FROM t);
                1

-- CTE in CREATE VIEW statement
> CREATE VIEW v AS
    WITH t(a, b, c, d) AS (SELECT 1, 2, 3, 4)
    SELECT * FROM t;
> SELECT * FROM v;
   1   2   3   4

-- CTE names are scoped
> WITH t  AS (SELECT 1),
       t2 AS (
        WITH t AS (SELECT 2)
        SELECT * FROM t)
    SELECT * FROM t2;
   2

Recursive query examples

Directed graph examples

The following example shows how to find all destinations from New York to other cities, including direct routes and those routed through other cities:

SQL
> DROP VIEW IF EXISTS routes;
> CREATE TEMPORARY VIEW routes(origin, destination) AS VALUES
  ('New York', 'Washington'),
  ('New York', 'Boston'),
  ('Boston', 'New York'),
  ('Washington', 'Boston'),
  ('Washington', 'Raleigh');

> WITH RECURSIVE destinations_from_new_york AS (
    SELECT 'New York' AS destination, ARRAY('New York') AS path, 0 AS length
    UNION ALL
    SELECT r.destination, CONCAT(d.path, ARRAY(r.destination)), d.length + 1
      FROM routes AS r
      JOIN destinations_from_new_york AS d
        ON d.destination = r.origin AND NOT ARRAY_CONTAINS(d.path, r.destination)
  )
  SELECT * FROM destinations_from_new_york;
  destination path                                length
  ----------- ----------------------------------- ------
  New York    ["New York"]                        0
  Washington  ["New York","Washington"]           1
  Boston      ["New York","Boston"]               1
  Boston      ["New York","Washington","Boston"]  2
  Raleigh     ["New York","Washington","Raleigh"] 2

The following example demonstrates how to traverse a directed graph and check for the presence of an infinite loop in the traversal:

SQL
> DROP TABLE IF EXISTS graph;
> CREATE TABLE IF NOT EXISTS graph( f int, t int, label string );
> INSERT INTO graph VALUES
    (1, 2, 'arc 1 -> 2'),
    (1, 3, 'arc 1 -> 3'),
    (2, 3, 'arc 2 -> 3'),
    (1, 4, 'arc 1 -> 4'),
    (4, 5, 'arc 4 -> 5'),
    (5, 1, 'arc 5 -> 1');
> WITH RECURSIVE search_graph(f, t, label, path, cycle) AS (
    SELECT *, array(struct(g.f, g.t)), false FROM graph g
    UNION ALL
    SELECT g.*, path || array(struct(g.f, g.t)), array_contains(path, struct(g.f, g.t))
      FROM graph g, search_graph sg
      WHERE g.f = sg.t AND NOT cycle
  )
  SELECT path FROM search_graph;

The result is a set of paths of increasing length while avoiding cycles and ending up with a non-infinite path

SQL
[{"f":1,"t":4},{"f":4,"t":5},{"f":5,"t":1},{"f":1,"t":2},{"f":2,"t":3}]`

that visits all nodes with a clear stop at sink “3”. A depth-first search is achieved by modifying the last line to order by the path column:

SQL
SELECT * FROM search_graph ORDER BY path;

The following examples demonstrate basic recursive techniques:

SQL
-- Simple loop-like example.
> WITH RECURSIVE r(col) AS (
    SELECT 'a'
    UNION ALL
    SELECT col || char(ascii(substr(col, -1)) + 1) FROM r WHERE LENGTH(col) < 10
  )
  SELECT * FROM r;
  col
  ----------
  a
  ab
  abc
  abcd
  abcde
  abcdef
  abcdefg
  abcdefgh
  abcdefghi
  abcdefghij

-- Another loop-like example.
> WITH RECURSIVE t(n) AS (
    SELECT 'foo'
    UNION ALL
    SELECT n || ' bar' FROM t WHERE length(n) < 20
  )
  SELECT n AS is_text FROM t;
  is_text
  -------
  foo
  foo bar
  foo bar bar
  foo bar bar bar
  foo bar bar bar bar
  foo bar bar bar bar bar

-- Invoking a recursive CTE from another CTE.
> WITH RECURSIVE x(id) AS (VALUES (1) UNION ALL SELECT id+1 FROM x WHERE id < 5),
                 y(id) AS (VALUES (1) UNION ALL SELECT id+1 FROM x WHERE id < 10)
  SELECT y.*, x.* FROM y LEFT JOIN x USING (id);
  id  id
  --  --
   1  1
   4  4
   6  null
   3  3
   5  5
   2  2

-- An example of invoking a non-recursive CTE from a recursive CTE.
> WITH RECURSIVE
    y (id) AS (VALUES (1)),
    x (id) AS (SELECT * FROM y UNION ALL SELECT id+1 FROM x WHERE id < 5)
  SELECT * FROM x;
  id
  --
   1
   2
   3
   4
   5

-- Temporary views
> CREATE OR REPLACE TEMPORARY VIEW nums AS
    WITH RECURSIVE nums (n) AS (
      VALUES (1)
      UNION ALL
      SELECT n+1 FROM nums WHERE n < 6
    )
    SELECT * FROM nums;
> SELECT * FROM nums;
  n
  __
   1
   2
   3
   4
   5
   6

> INSERT into a table
> CREATE TABLE rt(level INT);

> WITH RECURSIVE r(level) AS (
    VALUES (0)
    UNION ALL
    SELECT level + 1 FROM r WHERE level < 9
  )
  INSERT INTO rt SELECT * FROM r;

> SELECT * from rt;
  level
  -----
      0
      1
      2
      3
      4
      5
      6
      7
      8
      9

-- Nested recursive CTE are supported (only inside anchor)
-- Returns a triangular half – total of 55 rows – of a 10x10 square of numbers 1-10
> WITH RECURSIVE t(j) AS (
    WITH RECURSIVE s(i) AS (
      VALUES (1)
      UNION ALL
      SELECT i+1 FROM s WHERE i < 10
    )
    SELECT i FROM s
    UNION ALL
    SELECT j+1 FROM t WHERE j < 10
  )
  SELECT * FROM t;
  j
  --
   1
   2
   2
   3
   3
   3
   4
  ...
   9
  10
  10
  10
  10
  10
  10
  10
  10
  10
  10

-- Honoring LIMIT
> WITH RECURSIVE t(n) AS (
    VALUES (1)
    UNION ALL
    SELECT n+1 FROM t
  )
  SELECT * FROM t LIMIT 10;
  n
  --
   1
   2
   3
   4
   5
   6
   7
   8
   9
  10

-- Organizational tree examples
-- Extracting all departments under 'A'
> CREATE TABLE department (
    id INTEGER, -- department ID
    parent_department INTEGER, -- upper department ID
    name string -- department name
  );

> INSERT INTO department VALUES
    (0, NULL, 'ROOT'),
    (1, 0, 'A'),
    (2, 1, 'B'),
    (3, 2, 'C'),
    (4, 2, 'D'),
    (5, 0, 'E'),
    (6, 4, 'F'),
    (7, 5, 'G');

> WITH RECURSIVE subdepartment AS (
    SELECT name as root_name, * FROM department WHERE name = 'A'
    UNION ALL
    SELECT sd.root_name, d.* FROM department AS d, subdepartment AS sd
     WHERE d.parent_department = sd.id
  )
  SELECT * FROM subdepartment ORDER BY name;
  root_name id  parent_department  name
  --------- --  -----------------  ----
          A	 1                  0     A
          A	 2	                1     B
          A	 3                  2     C
          A	 4                  2     D
          A	 6	                4     F

-- The same as above but extracting the level number:
> WITH RECURSIVE subdepartment(level, id, parent_department, name) AS (
    SELECT 1, * FROM department WHERE name = 'A'
    UNION ALL
    SELECT sd.level + 1, d.* FROM department AS d, subdepartment AS sd
     WHERE d.parent_department = sd.id
  )
  SELECT * FROM subdepartment ORDER BY name;
  level  id  parent_department  name
  -----  --  -----------------  ----
      1   1                  0     A
      2   2                  1     B
      3   3                  2     C
      3   4                  2     D
      4   6                  4     F

-- The same as above but only shows level 2 or more
> WITH RECURSIVE subdepartment(level, id, parent_department, name) AS (
    SELECT 1, * FROM department WHERE name = 'A'
    UNION ALL
    SELECT sd.level + 1, d.* FROM department AS d, subdepartment AS sd
     WHERE d.parent_department = sd.id
  )
  SELECT * FROM subdepartment WHERE level >= 2 ORDER BY name;
  level id  parent_department  name
  ----- --  -----------------  ----
  2      2                  1     B
  3      3                  2     C
  3      4                  2     D
  4      6                  4     F

-- Binary tree examples
-- Another tree example is to find all paths from the second level nodes “2” and “3” to all other nodes.
> CREATE TABLE tree(id INTEGER, parent_id INTEGER);
> INSERT INTO tree
    VALUES (1, NULL), (2, 1), (3, 1), (4, 2), (5, 2), ( 6, 2), ( 7, 3), ( 8, 3),
           (9,    4), (10,4), (11,7), (12,7), (13,7), (14, 9), (15,11), (16,11);

> WITH RECURSIVE t(id, path) AS (
    VALUES(1,cast(array() as array<Int>))
    UNION ALL
    SELECT tree.id, t.path || array(tree.id)
      FROM tree JOIN t ON (tree.parent_id = t.id)
  )
  SELECT t1.*, t2.*
    FROM t AS t1 JOIN t AS t2
      ON (t1.path[0] = t2.path[0] AND
          size(t1.path) = 1 AND
          size(t2.path) > 1)
  ORDER BY t1.id, t2.id;
  id  path  id  path
  --  ----  --  ------------
   2  [2]    4  [2,4]
   2  [2]    5  [2,5]
   2  [2]    6  [2,6]
   2  [2]    9  [2,4,9]
   2  [2]   10  [2,4,10]
   2  [2]   14  [2,4,9,14]
   3  [3]    7  [3,7]
   3  [3]    8  [3,8]
   3  [3]   11  [3,7,11]
   3  [3]   12  [3,7,12]
   3  [3]   13  [3,7,13]
   3  [3]   15  [3,7,11,15]
   3  [3]   16  [3,7,11,16]

-- Another tree example counts the leaf nodes reachable from nodes “2” and “3”
> CREATE TABLE tree(id INTEGER, parent_id INTEGER);
> INSERT INTO tree
    VALUES (1, NULL), ( 2,1), (3,1 ), ( 4,2), ( 5,2), ( 6,2), ( 7, 3), ( 8, 3),
           (9,    4), (10,4), (11,7), (12,7), (13,7), (14,9), (15,11), (16,11);

> WITH RECURSIVE t(id, path) AS (
    VALUES(1,cast(array() as array<Int>))
    UNION ALL
    SELECT tree.id, t.path || array(tree.id)
      FROM tree JOIN t ON (tree.parent_id = t.id)
  )
  SELECT t1.id, count(*)
    FROM t AS t1 JOIN t AS t2
      ON (t1.path[0] = t2.path[0] AND
          size(t1.path) = 1 AND
          size(t2.path) > 1)
  GROUP BY t1.id
  ORDER BY t1.id;
  id  count(1)
  --  --------
   2         6
   3         7

-- Another tree example that is listing the paths to leaf node:
> CREATE TABLE tree(id INTEGER, parent_id INTEGER);
> INSERT INTO tree
    VALUES (1, NULL), ( 2,1), ( 3,1), ( 4,2), ( 5,2), ( 6,2), ( 7, 3), ( 8, 3),
           (9,    4), (10,4), (11,7), (12,7), (13,7), (14,9), (15,11), (16,11);

> WITH RECURSIVE t(id, path) AS (
    VALUES(1,cast(array() as array<Int>))
    UNION ALL
    SELECT tree.id, t.path || array(tree.id)
      FROM tree JOIN t ON (tree.parent_id = t.id)
  )
  SELECT id, path FROM t
    ORDER BY id;
  id  path
  --  -----------
   1  []
   2  [2]
   3  [3]
   4  [2,4]
   5  [2,5]
   6  [2,6]
   7  [3,7]
   8  [3,8]
   9  [2,4,9]
  10  [2,4,10]
  11  [3,7,11]
  12  [3,7,12]
  13  [3,7,13]
  14  [2,4,9,14]
  15  [3,7,11,15]
  16  [3,7,11,16]

-- Sudoku solver (https://sqlite.org)
-- Highlighted text represent the row-wise scan of a 9x9 Sudoku grid where missing values are represented with spaces.
> WITH RECURSIVE generate_series(gs) AS (
    SELECT 1
    UNION ALL
    SELECT gs+1 FROM generate_series WHERE gs < 9
  ),
  x( s, ind ) AS
    ( SELECT sud, position( ' ' IN sud )
      FROM  (SELECT '4  8725  5  64 213 29   8      6 73 1 8 2 4  97  15 2 3  2  1    659   28 2 37946'::STRING
        AS sud) xx
    UNION ALL
    SELECT substr( s, 1, ind - 1 ) || z || substr( s, ind + 1 ),
           position(' ' in repeat('x',ind) || substr( s, ind + 1 ) )
    FROM x,
     (SELECT gs::STRING AS z FROM generate_series) z
    WHERE ind > 0
      AND NOT EXISTS ( SELECT *
        FROM generate_series
        WHERE z.z = substr( s, ( (ind - 1 ) DIV 9 ) * 9 + gs, 1 )
           OR    z.z = substr( s, mod( ind - 1, 9 ) - 8 + gs * 9, 1 )
           OR    z.z = substr( s, mod( ( ( ind - 1 ) DIV 3 ), 3 ) * 3
                       + ( ( ind - 1 ) DIV 27 ) * 27 + gs
                       + ( ( gs - 1 ) DIV 3 ) * 6, 1 )
     )
  )
  SELECT s FROM x WHERE ind = 0;
  431872569587649213629351874245968731168723495973415628394286157716594382852137946

-- You can underspecify the problem, for example by replacing the leading ‘4’ with space ‘ ‘ which will yield two solutions
431872569587649213629351874245968731168723495973415628394286157716594382852137946
631872594587649213429351867245968731168723459973415628394286175716594382852137946

-- Finding prime numbers (Sieve of Eratosthenes, https://wiki.postgresql.org)
-- This is an implementation of the Sieve of Erastothenes algorithm for finding the prime numbers up to 150.
> WITH RECURSIVE t1(n) AS (
    VALUES(2)
    UNION ALL
    SELECT n+1 FROM t1 WHERE n < 100
  ),
  t2 (n, i) AS (
    SELECT 2*n, 2
      FROM t1 WHERE 2*n <= 150
    UNION ALL
    (
      WITH t3(k) AS (
        SELECT max(i) OVER () + 1 FROM t2
      ),
      t4(k) AS (
        SELECT DISTINCT k FROM t3
      )
      SELECT k*n, k
        FROM t1 CROSS JOIN t4
        WHERE k*k <= 150
    )
  )
  SELECT n FROM t1 EXCEPT
    SELECT n FROM t2
  ORDER BY 1;
  2
  3
  5
  7
  11
  13
  17
  19
  ...

-- using MAX RECURSION LEVEL to limit the recursion depth
> WITH RECURSIVE t(n) MAX RECURSION LEVEL 10 AS (
    VALUES(1)
    UNION ALL
    SELECT n+1 FROM t WHERE n < 100
  )
  SELECT * FROM t;
Error: RECURSION_LEVEL_LIMIT_EXCEEDED

-- using LIMIT to limit the the result set, and indirectly the recursion depth
> WITH RECURSIVE t(n) AS (
    VALUES(1)
    UNION ALL
    SELECT n+1 FROM t
  )
  SELECT * FROM t LIMIT 10;
   n
  ---
   1
   2
   3
   4
   5
   6
   7
   8
   9
  10

-- LIMIT does not help if an ORDER BY clause is used, which prevents early termination of the recursion
> WITH RECURSIVE t(n) AS (
    VALUES(1)
    UNION ALL
    SELECT n+1 FROM t
  )
  SELECT * FROM t ORDER BY n LIMIT 10;
 Error: RECURSION_LEVEL_LIMIT_EXCEEDED

Query

Syntax​

Parameters​

Limitations​

Examples​

Recursive query examples​

Directed graph examples​

Related articles​