Sampling queries (Databricks SQL)
The TABLESAMPLE
statement is used to sample the table. It supports the following sampling methods:
TABLESAMPLE
(xROWS
): Sample the table down to the given number of rows.TABLESAMPLE
(xPERCENT
): Sample the table down to the given percentage. Note that percentages are defined as a number between 0 and 100.TABLESAMPLE
(BUCKET
xOUT OF
y): Sample the table down to ax
out ofy
fraction.
Note
TABLESAMPLE
returns the approximate number of rows or fraction requested.Always use
TABLESAMPLE (X PERCENT)
if randomness is important.TABLESAMPLE (X ROWS)
is not a simple random sample but instead is implemented usingLIMIT
.
Syntax
TABLESAMPLE ({ integer_expression | decimal_expression } PERCENT)
| TABLESAMPLE ( integer_expression ROWS )
| TABLESAMPLE ( BUCKET integer_expression OUT OF integer_expression )
Examples
> SELECT * FROM test;
5 Alex
8 Lucy
2 Mary
4 Fred
1 Lisa
9 Eric
10 Adam
6 Mark
7 Lily
3 Evan
> SELECT * FROM test TABLESAMPLE (50 PERCENT);
5 Alex
2 Mary
4 Fred
9 Eric
10 Adam
3 Evan
> SELECT * FROM test TABLESAMPLE (5 ROWS);
5 Alex
8 Lucy
2 Mary
4 Fred
1 Lisa
> SELECT * FROM test TABLESAMPLE (BUCKET 4 OUT OF 10);
8 Lucy
2 Mary
9 Eric
6 Mark