`CLUSTER BY` clause (TABLE)

Applies to: Databricks SQL Databricks Runtime 13.3 LTS and above Delta Lake only

Defines liquid, multi-dimensional clustering for a relation.

Databricks recommends using automatic liquid clustering and predictive optimization for all Unity Catalog managed tables. These features provide intelligent optimization of data layout based on your data usage patterns.

You can use this clause when you:

Create a table using CREATE TABLE.
Alter a table with ALTER TABLE to change the clustering columns.

To cluster rows with altered clustering columns, you must run OPTIMIZE. Note that rows clustered by previous clustering columns are not affected.

You cannot change the clustering columns of materialized views or streaming tables with ALTER TABLE.
Create a materialized view using CREATE MATERIALIZED VIEW.
Create a streaming table using CREATE STREAMING TABLE.

Updated rows do not get automatically re-clustered. Run OPTIMIZE to re-cluster updated rows.

For more information on liquid clustering see Use liquid clustering for tables.

Syntax

CLUSTER BY { ( column_name [, ...] ) |
             AUTO |
             NONE }

Parameters

column_name

Specifies columns of the relation by which to cluster the data. The column order does not matter. To benefit from altering clustering you should run OPTIMIZE.
AUTO

Applies to: Databricks SQL Databricks Runtime 15.4 and above

Directs Delta Lake to automatically determine and over time adapt to the best columns to cluster by. For more information on liquid clustering see Use liquid clustering for tables.
NONE

Turns off clustering for the relation being altered. Newly inserted or updated data will not be clustered by OPTIMIZE. To not use clustering when creating a relation, omit the CLUSTER BY clause.

Examples

You can find more examples in Use liquid clustering for tables.

SQL
-- Create a table with a clustering column
> CREATE TABLE t(a int, b string) CLUSTER BY (a);

-- The clustering of an existing Delta table to add a second dimension
> ALTER TABLE t CLUSTER BY (a, b);

-- Recluster the table
> OPTIMIZE t;

-- Remove the clustering
> ALTER TABLE t CLUSTER BY NONE;

Syntax​

Parameters​

Examples​

Related articles​

Syntax

Parameters

Examples

Related articles