Drop or replace a Delta table
Databricks supports SQL standard DDL commands for dropping and replacing tables registered with either Unity Catalog or the Hive metastore. This article provides examples of dropping and replacing Delta tables and recommendations for syntax depending on your configured environment and desired outcome.
When to drop a table
You should use DROP TABLE
to remove a table from the metastore when you want to permanently delete the table and have no intention of creating a new table in the same location. For example:
DROP TABLE table_name
DROP TABLE
has different semantics depending on the type of table and whether the table is registered to Unity Catalog or the legacy Hive metastore.
Table type |
Metastore |
Behavior |
---|---|---|
Managed |
Unity Catalog |
The table is removed from the metastore and underlying data is marked for deletion. You can |
Managed |
Hive |
The table is removed from the metastore and the underlying data is deleted. |
External |
Unity Catalog |
The table is removed from the metastore but the underlying data remains. URI access privileges are now governed by the external location that contains the data. |
External |
Hive |
The table is removed from the metastore but the underlying data remains. Any URI access privileges are unchanged. |
DROP TABLE
semantics differ across table types, and Unity Catalog maintains a history of Delta tables using an internal table ID. However, all tables share the common result that after the operation completes, the previously registered table name no longer has an active link to data and table history from the metastore.
See DROP TABLE.
Note
Databricks does not recommend the pattern of dropping and then recreating a table using the same name for production pipelines or systems, as this pattern can result in unexpected results for concurrent operations. See Replace data with concurrent operations.
When to replace a table
Databricks recommends using CREATE OR REPLACE TABLE
statements for use cases where you want to fully overwrite the target table with new data. For example, to overwrite a Delta table with all data from a Parquet directory, you could run the following command:
CREATE OR REPLACE TABLE table_name
AS SELECT * FROM parquet.`/path/to/files`
CREATE OR REPLACE TABLE
has the same semantics regardless of the table type or metastore in use. The following are important advantages of CREATE OR REPLACE TABLE
:
Table contents are replaced, but the table identity is maintained.
The table history is retained, and you can revert the table to an earlier version with the
RESTORE
command.The operation is a single transaction, so there is never a time when the table doesn’t exist.
Concurrent queries reading from the table can continue without interruption. Because the version before and after replacement still exists in the table history, concurrent queries can reference either version of the table as necessary.
See CREATE TABLE [USING].
Replace data with concurrent operations
Whenever you want to perform a full replacement of data in a table that might be used in concurrent operations, you must use CREATE OR REPLACE TABLE
.
The following anti-pattern should not be used:
-- This is an anti-pattern. Avoid doing this!
DROP TABLE IF EXISTS table_name;
CREATE TABLE table_name
AS SELECT * FROM parquet.`/path/to/files`;
The reasons for this recommendation vary depending on whether you’re using managed or external tables and whether you’re using Unity Catalog, but across all Delta table types using this pattern can result in an error, dropped records, or corrupted results.
Instead, Databricks recommends always using CREATE OR REPLACE TABLE
, as in the following example:
CREATE OR REPLACE TABLE table_name
AS SELECT * FROM parquet.`/path/to/files`
Because the table history is maintained during the atomic data replacement, concurrent transactions can validate the version of the source table referenced, and therefore fail or reconcile concurrent transactions as necessary without introducing unexpected behavior or results.