Skip to main content

CLUSTER BY clause (TABLE)

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime 13.3 LTS and above check marked yes Delta Lake only

Defines liquid, multi-dimensional clustering for a Delta Lake table.

Databricks recommends using automatic liquid clustering and predictive optimization for all Unity Catalog managed tables. These features provide intelligent optimization of data layout based on your data usage patterns.

You can use this clause when you:

Updated rows do not get automatically re-clustered. Run OPTIMIZE to re-cluster updated rows..

For more information on liquid clustering see Use liquid clustering for Delta tables

Syntax

CLUSTER BY { ( column_name [, ...] ) |
AUTO |
NONE }

Parameters

  • column_name

    Specifies columns of the table by which to cluster the data. The column order does not matter. To benefit from altering clustering you should run OPTIMIZE.

  • AUTO

    Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime 15.4 and above

    Directs Delta Lake to automatically determine and over time adapt to the best columns to cluster by. For more information on liquid clustering see Use liquid clustering for Delta tables.

  • NONE

    Turns off clustering for the table being altered. Newly inserted or updated data will not be clustered by OPTIMIZE. To not use clustering when creating a table, omit the CLUSTER BY clause.

Examples

You can find more examples in Use liquid clustering for Delta tables.

SQL
-- Create a table with a clustering column
> CREATE TABLE t(a int, b string) CLUSTER BY (a);

-- The clustering of an existing Delta table to add a second dimension
> ALTER TABLE t CLUSTER BY (a, b);

-- Recluster the table
> OPTIMIZE t;

-- Remove the clustering
> ALTER TABLE t CLUSTER BY NONE;