This feature is in Public Preview.
Databricks supports column mapping for Delta Lake tables, which enables metadata-only changes to mark columns as deleted or renamed without rewriting data files. It also allows users to name Delta table columns using characters that are not allowed by Parquet, such as spaces, so that users can directly ingest CSV or JSON data into Delta without the need to rename columns due to previous character constraints.
Enabling column mapping also enables random file prefixes, which removes the ability to explore data using Hive-style partitioning. See Do Delta Lake and Parquet share partitioning strategies?.
Enabling column mapping on tables might break downstream operations that rely on Delta change data feed. See Change data feed limitations for tables with column mapping enabled.
Enabling column mapping on tables might break streaming read from the Delta table as a source, including in Delta Live Tables. See Streaming with column mapping and schema changes.
Enabling column mapping for a table upgrades the Delta table version. This protocol upgrade is irreversible. Tables with column mapping enabled can only be read in Databricks Runtime 10.2 and above.
Column mapping requires the following Delta protocols:
Reader version 2 or above.
Writer version 5 or above.
For a Delta table with the required protocol versions, you can enable column mapping by setting
You can use the following command to upgrade the table version and enable column mapping:
ALTER TABLE <table-name> SET TBLPROPERTIES ( 'delta.minReaderVersion' = '2', 'delta.minWriterVersion' = '5', 'delta.columnMapping.mode' = 'name' )
You cannot turn off column mapping after you enable it. If you try to set
'delta.columnMapping.mode' = 'none', you’ll get an error.
Available in Databricks Runtime 10.2 and above.
When column mapping is enabled for a Delta table, you can rename a column:
ALTER TABLE <table-name> RENAME COLUMN old_col_name TO new_col_name
For more examples, see Update Delta Lake table schema.
Available in Databricks Runtime 11.0 and above.
When column mapping is enabled for a Delta table, you can drop one or more columns:
ALTER TABLE table_name DROP COLUMN col_name ALTER TABLE table_name DROP COLUMNS (col_name_1, col_name_2, ...)
For more details, see Update Delta Lake table schema.
When column mapping is enabled for a Delta table, you can include spaces and any of these characters in the table’s column names:
This feature is in Public Preview in Databricks Runtime 13.1 and above.
You can provide a schema tracking location to enable streaming from Delta tables with column mapping enabled. This overcomes an issue in which non-additive schema changes could result in broken streams.
Each streaming read against a data source must have its own
schemaTrackingLocation specified. The specified
schemaTrackingLocation must be contained within the directory specified for the
checkpointLocation of the target table for streaming write.
For streaming workloads that combine data from multiple source Delta tables, you need to specify unique directories within the
checkpointLocation for each source table.
schemaTrackingLocation is used to specify the path for schema tracking, as shown in the following code example:
checkpoint_path = "/path/to/checkpointLocation" (spark.readStream .option("schemaTrackingLocation", checkpoint_path) .table("delta_source_table") .writeStream .option("checkpointLocation", checkpoint_path) .toTable("output_table") )