Rename and drop columns with Delta Lake column mapping
Delta Lake column mapping enables metadata-only changes to rename or drop columns without rewriting data files. Column mapping also allows characters not supported by Parquet in column names, such as spaces, so you can directly ingest CSV or JSON data into Delta Lake without renaming columns.
Prerequisites
Before enabling column mapping, review the Limitations.
Column mapping requires the following:
- Delta protocols: reader version 2 or above, and writer version 5 or above. See Delta Lake feature compatibility and protocols.
- Reading tables with column mapping enabled requires Databricks Runtime 10.4 LTS and above
Enable column mapping
Use the following command to enable column mapping with id mode on a new table:
CREATE table <table-name> (
id INT,
name STRING
)
USING DELTA
TBLPROPERTIES (
'delta.columnMapping.mode' = 'id'
);
Use the following command to enable column mapping with name mode on an existing table:
ALTER TABLE <table-name> SET TBLPROPERTIES (
'delta.columnMapping.mode' = 'name'
)
For details on column mapping modes, see Column mapping modes.
Rename a column
When column mapping is enabled for a Delta Lake table, you can rename a column:
ALTER TABLE <table-name> RENAME COLUMN old_col_name TO new_col_name
For more examples, see Update table schemas with schema evolution.
Drop columns
When column mapping is enabled for a Delta Lake table, you can drop one or more columns:
ALTER TABLE table_name DROP COLUMN col_name
ALTER TABLE table_name DROP COLUMNS (col_name_1, col_name_2, ...)
For more details, see Update table schemas with schema evolution.
Column mapping modes
The delta.columnMapping.mode table property enables metadata-only changes to mark columns as deleted or renamed without rewriting data files. The following modes are available:
none(default): Column mapping is not enabled. Column names are subject to Parquet naming constraints.name: This enables metadata-only column renames and drops, and allows special characters in column names.namemode may be set on new and existing tables.id: This enables metadata-only column renames and drops, and allows special characters in column names.idmode must be set on table creation and cannot be set on existing tables.
Databricks recommends id mode for most use cases for compatibility purposes. However, name mode is automatically set if you do not provide a value for delta.columnMapping.mode and you enable Iceberg compatibility features such as UniForm.
Supported characters in column names
When column mapping is enabled for a Delta Lake table, you can include spaces and any of these characters in column names: ,;{}()\n\t=.
Remove column mapping
Removing column mapping rewrites all data files to replace physical column names with logical names. This operation doesn't support row-level or physical conflict resolution.
Concurrent write operations will cause a ConcurrentModificationException.
Before removing column mapping:
- Pause all concurrent write operations, including streaming jobs and ETL pipelines.
- Turn off predictive optimization on the table.
- For large tables, schedule this operation during low-activity periods.
You can remove column mapping from a table using the following command:
ALTER TABLE <table-name> SET TBLPROPERTIES ('delta.columnMapping.mode' = 'none')
To remove column mapping and downgrade the table protocol, see Maintain compatibility.
Maintain compatibility
In Databricks Runtime 15.4 LTS and above, you can use the DROP FEATURE command to remove column mapping and downgrade the table protocol, maintaining compatibility with readers using Databricks Runtime 10.3 and below.
Dropping column mapping from a table doesn't remove the random prefixes used in directory names for partitioned tables.
See Drop a Delta Lake table feature and downgrade table protocol.
Column mapping and streaming
Non-additive schema changes, such as column renames or drops, can break streaming reads. Use a schemaTrackingLocation to allow Delta Lake to track schema changes and prevent stream failures.
Configuration
When configuring schemaTrackingLocation:
- Each streaming read from a source table must have its own
schemaTrackingLocation. - The
schemaTrackingLocationmust be within thecheckpointLocationdirectory of the streaming write target. - For workloads that read from multiple source tables, specify unique subdirectories within
checkpointLocationfor each source.
For a complete list of streaming Delta Lake options, see Delta Lake.
Enable column mapping on an active stream
To enable column mapping on an active streaming job:
- Stop the stream
- Enable column mapping on the table
- Restart the stream (first restart - initializes column mapping)
- Restart the stream again (second restart - enables schema changes)
Any further schema changes (adding or dropping columns, or changing column types) require you to restart the stream.
Example
To specify a schemaTrackingLocation for a streaming read from a Delta Lake table with column mapping, use the following example:
checkpoint_path = "/path/to/checkpointLocation"
(spark.readStream
.option("schemaTrackingLocation", checkpoint_path)
.table("delta_source_table")
.writeStream
.option("checkpointLocation", checkpoint_path)
.toTable("output_table")
)
Limitations
Enabling column mapping might break the following:
- Legacy workloads that rely on directory names for reading Delta Lake tables. Partitioned tables with column mapping use random prefixes instead of column names for partition directories. See Do Delta Lake and Parquet share partitioning strategies?.
- Downstream operations using change data feed. See Tables with column mapping.
- Streaming reads from the Delta Lake table, including in Lakeflow Spark Declarative Pipelines. See Column mapping and streaming.