Rename and drop columns with Delta Lake column mapping

Databricks supports column mapping for Delta Lake tables, which enables metadata-only changes to mark columns as deleted or renamed without rewriting data files. It also allows users to name Delta table columns using characters that are not allowed by Parquet, such as spaces, so that users can directly ingest CSV or JSON data into Delta without the need to rename columns due to previous character constraints.

important

Tables with column mapping enabled can only be read in Databricks Runtime 10.4 LTS and above.

If you use a legacy pattern that relies on directory names for reading Delta tables, enabling column mapping might break legacy workloads. Partitioned tables with column mapping enabled use random prefixes instead of column names for partition directories. See Do Delta Lake and Parquet share partitioning strategies?.

Enabling column mapping on tables might break downstream operations that rely on Delta change data feed. See Change data feed limitations for tables with column mapping enabled.

Enabling column mapping on tables might break streaming reads from the Delta table as a source, including in Lakeflow Declarative Pipelines. See Streaming with column mapping and schema changes.

Enable column mapping

Use the following command to enable column mapping:

SQL
  ALTER TABLE <table-name> SET TBLPROPERTIES (
    'delta.columnMapping.mode' = 'name'
  )

Column mapping requires the following Delta protocols:

Reader version 2 or above.
Writer version 5 or above.

See Delta Lake feature compatibility and protocols.

Disable column mapping

In Databricks Runtime 15.3 and above, you can use the DROP FEATURE command to remove column mapping from a table and downgrade the table protocol.

important

Dropping column mapping from a table does not remove the random prefixes used in directory names for partitioned tables.

See Drop a Delta Lake table feature and downgrade table protocol.

Rename a column

note

Available in Databricks Runtime 10.4 LTS and above.

When column mapping is enabled for a Delta table, you can rename a column:

SQL
ALTER TABLE <table-name> RENAME COLUMN old_col_name TO new_col_name

For more examples, see Update Delta Lake table schema.

Drop columns

note

Available in Databricks Runtime 11.3 LTS and above.

When column mapping is enabled for a Delta table, you can drop one or more columns:

SQL
ALTER TABLE table_name DROP COLUMN col_name
ALTER TABLE table_name DROP COLUMNS (col_name_1, col_name_2, ...)

For more details, see Update Delta Lake table schema.

Supported characters in column names

When column mapping is enabled for a Delta table, you can include spaces and any of these characters in the table's column names: ,;{}()\n\t=.

Streaming with column mapping and schema changes

You can provide a schema tracking location to enable streaming from Delta tables with column mapping enabled. This overcomes an issue in which non-additive schema changes could result in broken streams.

Each streaming read against a data source must have its own schemaTrackingLocation specified. The specified schemaTrackingLocation must be contained within the directory specified for the checkpointLocation of the target table for streaming write. For streaming workloads that combine data from multiple source Delta tables, you must specify unique directories within the checkpointLocation for each source table.

important

To enable column mapping on a currently running job, you must stop and restart the job at least twice (that is, restart two times).

The first restart initializes the column mapping.
The second restart enables the schema changes to take effect.

Any further schema changes (such as adding or dropping columns, or changing a column type) will also require you to restart the job.

The option schemaTrackingLocation is used to specify the path for schema tracking, as shown in the following code example:

Python
checkpoint_path = "/path/to/checkpointLocation"

(spark.readStream
  .option("schemaTrackingLocation", checkpoint_path)
  .table("delta_source_table")
  .writeStream
  .option("checkpointLocation", checkpoint_path)
  .toTable("output_table")
)

Enable column mapping​

Disable column mapping​

Rename a column​

Drop columns​

Supported characters in column names​

Streaming with column mapping and schema changes​

Enable column mapping

Disable column mapping

Rename a column

Drop columns

Supported characters in column names

Streaming with column mapping and schema changes