TABLESAMPLE clause (Databricks SQL)

The TABLESAMPLE statement is used to sample the relation.

Syntax

TABLESAMPLE ( { percentage PERCENT ) |
                num_rows ROWS |
                BUCKET fraction OUT OF total } )
            [ REPEATABLE ( seed ) ]

Parameters

  • percentage PERCENT

    An INTEGER or DECIMAL constant percentage between 0 and 100 specifying which percentage of the table’s rows to sample.

  • num_rows ROWS

    A constant positive INTEGER expression num_rows specifying an absolute number of rows out of all rows to sample.

  • BUCKET fraction OUT OF total

    An INTEGER constant fraction specifying the portion out of the INTEGER constant total to sample.

  • REPEATABLE ( seed )

    An optional positive INTEGER constant seed, used to always produce the same set of rows. Use this clause when you want to reissue the query multiple times, and you expect the same set of sampled rows.

Note

  • TABLESAMPLE returns the approximate number of rows or fraction requested.

  • Always use TABLESAMPLE (percent PERCENT) if randomness is important. TABLESAMPLE (num_rows ROWS) is not a simple random sample but instead is implemented using LIMIT.

Examples

> CREATE TEMPORARY VIEW test(id, name) AS
  VALUES ( 1, 'Lisa'),
         ( 2, 'Mary'),
         ( 3, 'Evan'),
         ( 4, 'Fred'),
         ( 5, 'Alex'),
         ( 6, 'Mark'),
         ( 7, 'Lily'),
         ( 8, 'Lucy'),
         ( 9, 'Eric'),
         (10, 'Adam');
> SELECT * FROM test;
  5 Alex
  8 Lucy
  2 Mary
  4 Fred
  1 Lisa
  9 Eric
 10 Adam
  6 Mark
  7 Lily
  3 Evan

> SELECT * FROM test TABLESAMPLE (30 PERCENT) REPEATABLE (123);
  1     Lisa
  2     Mary
  3     Evan
  5     Alex
  8     Lucy

> SELECT * FROM test TABLESAMPLE (5 ROWS);
  5 Alex
  8 Lucy
  2 Mary
  4 Fred
  1 Lisa

> SELECT * FROM test TABLESAMPLE (BUCKET 4 OUT OF 10);
  8 Lucy
  2 Mary
  9 Eric
  6 Mark