`CLUSTER BY` clause (TABLE)

Applies to: check marked yes Databricks SQL Databricks Runtime 13.3 LTS and above Delta Lake only

Defines liquid, multi-dimensional clustering for a Delta Lake table.

You can use this clause when you:

Create a table using CREATE TABLE
Alter a table with ALTER TABLE to change the clustering columns. To cluster rows with altered clustering columns, you must run OPTIMIZE. Note that rows clustered by previous clustering columns are not affected.

Updated rows do not get automatically re-clustered. Run OPTIMIZE to re-cluster updated rows..

For more information on liquid clustering see Use liquid clustering for Delta tables

Syntax

CLUSTER BY { ( column_name [, ...] ] ) |
             NONE }

Parameters

column_name

Specifies columns of the table by which to cluster the data. The column order does not matter. To benefit from altering clustering you should run OPTIMIZE.
NONE

Turns off clustering for the table being altered. Newly inserted or updated data will not be clustered by OPTIMIZE. To not use clustering when creating a table, omit the CLUSTER BY clause.

Examples

You can find more examples in Use liquid clustering for Delta tables.

-- Create a table with a clustering column
> CREATE TABLE t(a int, b string) CLUSTER BY (a);

-- The clustering of an existing Delta table to add a second dimension
> ALTER TABLE t CLUSTER BY (a, b);

-- Recluster the table
> OPTIMIZE t;

-- Remove the clustering
> ALTER TABLE t CLUSTER BY NONE;