Create tables

This article introduces the concept of managed and external tables in Unity Catalog and describes how to create tables in Unity Catalog.

Note

When you create a table, be sure to reference a catalog that is governed by Unity Catalog. The catalog hive_metastore appears in Data Explorer but is not considered governed by Unity Catalog. It is managed by your Databricks workspace’s Hive metastore. All other catalogs listed are governed by Unity Catalog.

You can use the Unity Catalog table upgrade interface to upgrade existing tables registered in the Hive metastore to Unity Catalog. See Upgrade tables and views to Unity Catalog.

Managed tables

Managed tables are the default way to create tables in Unity Catalog. They use the Delta table format.

By default, managed tables are stored in the root storage location you configured when you created the metastore. However, you can specify a different storage location at the catalog or schema level. Managed table data is stored at the lowest level in the hierarchy for which a location has been specified. For example, if a schema location is specified, it will be used. If no schema location is specified, a catalog location will be used, and if there is no catalog location, the default metastore root location is used.

External tables

External tables are tables whose data is stored outside of the root storage location. Use external tables only when you require direct access to the data using other tools.

When you run DROP TABLE on an external table, Unity Catalog does not delete the underlying data. You can manage privileges on external tables and use them in queries in the same way as managed tables. To create an external table with SQL, specify a LOCATION path in your CREATE TABLE statement. External tables can use the following file formats:

  • DELTA

  • CSV

  • JSON

  • AVRO

  • PARQUET

  • ORC

  • TEXT

To manage access to the underlying cloud storage for an external table, you must set up storage credentials and external locations.

To learn more, see Create an external table.

Requirements

You must have the CREATE TABLE privilege on the schema in which you want to create the table, as well as the USE SCHEMA privilege on the schema and the USE CATALOG privilege on the parent catalog.

If you are creating an external table, see Create an external table for additional requirements.

Create a managed table

To create a managed table, run the following SQL command. You can also use the example notebook to create a table. Items in brackets are optional. Replace the placeholder values:

  • <catalog_name>: The name of the catalog. This cannot be the hive_metastore catalog that is created automatically for the Hive metastore associated with your Databricks workspace.

  • <schema_name>: The name of the schema.

  • <table_name>: A name for the table.

  • <column_specification>: The name and data type for each column.

CREATE TABLE <catalog_name>.<schema_name>.<table_name>
(
  <column_specification>
);
spark.sql("CREATE TABLE <catalog_name>.<schema_name>.<table_name> "
  "("
  "  <column_specification>"
  ")")
library(SparkR)

sql(paste("CREATE TABLE <catalog_name>.<schema_name>.<table_name> ",
  "(",
  "  <column_specification>",
  ")",
  sep = ""))
spark.sql("CREATE TABLE <catalog_name>.<schema_name>.<table_name> " +
  "(" +
  "  <column_specification>" +
  ")")

You can also create a managed table by using the Databricks Terraform provider and databricks_table. You can retrieve a list of table full names by using databricks_tables.

For example, to create the table main.default.department and insert five rows into it:

CREATE TABLE main.default.department
(
  deptcode  INT,
  deptname  STRING,
  location  STRING
);

INSERT INTO main.default.department VALUES
  (10, 'FINANCE', 'EDINBURGH'),
  (20, 'SOFTWARE', 'PADDINGTON'),
  (30, 'SALES', 'MAIDSTONE'),
  (40, 'MARKETING', 'DARLINGTON'),
  (50, 'ADMIN', 'BIRMINGHAM');
spark.sql("CREATE TABLE main.default.department "
  "("
  "  deptcode  INT,"
  "  deptname  STRING,"
  "  location  STRING"
  ")"
  "INSERT INTO main.default.department VALUES "
  "  (10, 'FINANCE', 'EDINBURGH'),"
  "  (20, 'SOFTWARE', 'PADDINGTON'),"
  "  (30, 'SALES', 'MAIDSTONE'),"
  "  (40, 'MARKETING', 'DARLINGTON'),"
  "  (50, 'ADMIN', 'BIRMINGHAM')")
library(SparkR)

sql(paste("CREATE TABLE main.default.department ",
  "(",
  "  deptcode  INT,",
  "  deptname  STRING,",
  "  location  STRING",
  ")",
  "INSERT INTO main.default.department VALUES ",
  "  (10, 'FINANCE', 'EDINBURGH'),",
  "  (20, 'SOFTWARE', 'PADDINGTON'),",
  "  (30, 'SALES', 'MAIDSTONE'),",
  "  (40, 'MARKETING', 'DARLINGTON'),",
  "  (50, 'ADMIN', 'BIRMINGHAM')",
  sep = ""))
spark.sql("CREATE TABLE main.default.department " +
  "(" +
  "  deptcode  INT," +
  "  deptname  STRING," +
  "  location  STRING" +
  ")" +
  "INSERT INTO main.default.department VALUES " +
  "  (10, 'FINANCE', 'EDINBURGH')," +
  "  (20, 'SOFTWARE', 'PADDINGTON')," +
  "  (30, 'SALES', 'MAIDSTONE')," +
  "  (40, 'MARKETING', 'DARLINGTON')," +
  "  (50, 'ADMIN', 'BIRMINGHAM')")

Example notebooks for creating managed tables

You can use the following example notebooks to create a catalog, schema, and managed table, and to manage permissions on them.

Create and manage a table in Unity Catalog with SQL

Open notebook in new tab

Create and manage a table in Unity Catalog with Python

Open notebook in new tab

Drop a managed table

To drop a managed table, run the following SQL command:

DROP TABLE IF EXISTS catalog_name.schema_name.table_name;

When a managed table is dropped, its underlying data is deleted from your cloud tenant within 30 days.

Create an external table

The data in an external table is stored in a path on your cloud tenant. To work with external tables, Unity Catalog introduces two new objects to access and work with external cloud storage:

  • A storage credential contains an authentication method for accessing a cloud storage location. The storage credential does not contain a mapping to the path to which it grants access. Storage credentials are access-controlled to determine which users can use the credential. To use an external storage credential directly, add WITH <credential_name> to your SQL command.

  • An external location maps a storage credential with a cloud storage path to which it grants access. The external location grants access only to that cloud storage path and its contents. External locations are access-controlled to determine which users can use them. An external location is used automatically when your SQL command contains a LOCATION clause and no WITH <credential_name> clause.

Requirements

  • To create an external table, you must have:

    • The CREATE EXTERNAL TABLE privilege on an external location or storage credential, that grants access to the LOCATION accessed by the external table.

    • The USE SCHEMA permission on the table’s parent schema.

    • The USE CATALOG permission on the table’s parent catalog.

    • The CREATE TABLE permission on the table’s parent schema.

External locations and storage credentials are stored in the top level of the metastore, rather than in a catalog. To create a storage credential or an external location, you must be the metastore admin or an account-level admin. See Manage external locations and storage credentials.

To create an external table, follow these high-level steps. You can also use an example notebook to create the storage credential, external location, and external table and manage permissions for them.

Create an external table

You can create an external table using an external location (recommended) or using a storage credential directly. In the following examples, replace the placeholder values:

  • <catalog>: The name of the catalog that will contain the table.

  • <schema>: The name of the schema that will contain the table.

  • <table_name>: A name for the table.

  • <column_specification>: The name and data type for each column.

  • <bucket_path>: The path on your cloud tenant where the table will be created.

  • <table_directory>: A directory where the table will be created. Use a unique directory for each table.

Important

Once a table is created in a path, users can no longer directly access the files in that path even if they have been given privileges on an external location or storage credential to do so. This is to ensure that users cannot circumvent access controls applied to tables by reading files from your cloud tenant directly.

Create an external table using an external location

To create an empty external table using an external location:

CREATE TABLE <catalog>.<schema>.<table_name>
(
  <column_specification>
)
LOCATION 's3://<bucket_path>/<table_directory>';
spark.sql("CREATE TABLE <catalog>.<schema>.<table_name> "
  "("
  "  <column_specification>"
  ") "
  "LOCATION 's3://<bucket_path>/<table_directory>'")
library(SparkR)

sql(paste("CREATE TABLE <catalog>.<schema>.<table_name> ",
  "(",
  "  <column_specification>",
  ") ",
  "LOCATION 's3://<bucket_path>/<table_directory>'",
  sep = ""))
spark.sql("CREATE TABLE <catalog>.<schema>.<table_name> " +
  "(" +
  "  <column_specification>" +
  ") " +
  "LOCATION 's3://<bucket_path>/<table_directory>'")

Unity Catalog checks that you have the following permissions:

  • CREATE EXTERNAL TABLE on the external location that references the cloud storage path you specify.

  • CREATE TABLE on the parent schema.

  • USE SCHEMA on the parent schema.

  • USE CATALOG on the parent catalog.

If so, the external table is created. Otherwise, an error occurs and the external table is not created.

Note

You can instead migrate an existing external table in the Hive metastore to Unity Catalog without duplicating its data. See Upgrade an external table to Unity Catalog.

You can also create an external table by using the Databricks Terraform provider and databricks_table. You can retrieve a list of table full names by using databricks_tables.

Create an external table using a storage credential directly

Note

Databricks recommends that you use external locations, rather than using storage credentials directly.

To create an external table using a storage credential, add a WITH (CREDENTIAL <credential_name>) clause to your SQL statement:

CREATE TABLE <catalog>.<schema>.<table_name>
(
  <column_specification>
)
LOCATION 's3://<bucket_path>/<table_directory>'
WITH (CREDENTIAL <storage_credential>);
spark.sql("CREATE TABLE <catalog>.<schema>.<table_name> "
  "( "
  "  <column_specification> "
  ") "
  "LOCATION 's3://<bucket_path>/<table_directory>' "
  "WITH (CREDENTIAL <storage_credential>)")
library(SparkR)

sql(paste("CREATE TABLE <catalog>.<schema>.<table_name> ",
 "( ",
 "  <column_specification> ",
 ") ",
 "LOCATION 's3://<bucket_path>/<table_directory>' ",
 "WITH (CREDENTIAL <storage_credential>)",
 sep = ""))
spark.sql("CREATE TABLE <catalog>.<schema>.<table_name> " +
  "( " +
  "  <column_specification> " +
  ") " +
  "LOCATION 's3://<bucket_path>/<table_directory>' " +
  "WITH (CREDENTIAL <storage_credential>)")

Unity Catalog checks whether you have the CREATE TABLE permission on the storage credential you specify, and whether the storage credential authorizes reading to and writing from the location you specified in the LOCATION clause. If both of these things are true, the external table is created. Otherwise, an error occurs and the external table is not created.

Example notebook for creating an external table

Create and manage an external table in Unity Catalog

Open notebook in new tab

Create a table from files stored in your cloud tenant

You can populate a managed or external table with records from files stored in your cloud tenant. Unity Catalog reads the files at that location and inserts their contents into the table. In Unity Catalog, this is called path-based-access.

Note

A storage path where you create an external table cannot also be used to read or write data files.

Explore the contents of the files

To explore data stored in external location:

  1. List the files in a cloud storage path:

    LIST 's3://<path_to_files>';
    
    display(spark.sql("LIST 's3://<path_to_files>'"))
    
    library(SparkR)
    
    display(sql("LIST 's3://<path_to_files>'"))
    
    display(spark.sql("LIST 's3://<path_to_files>'"))
    

    If you have the READ FILES permission on the external location associated with the cloud storage path, a list of data files in that location is returned.

  2. Query the data in the files in a given path:

    SELECT * FROM <format>.`s3://<path_to_files>`;
    
    display(spark.read.load("s3:://<path_to_files>"))
    
    library(SparkR)
    
    display(loadDF("s3:://<path_to_files>"))
    
    display(spark.read.load("s3:://<path_to_files>"))
    

    To explore data using a storage credential directly:

    SELECT * FROM <format>.`s3://<path_to_files>`;
    WITH (CREDENTIAL <storage_credential);
    
    display(spark.sql("SELECT * FROM <format>.`s3://<path_to_files>` "
      "WITH (CREDENTIAL <storage_credential)"))
    
    library(SparkR)
    
    display(sql(paste("SELECT * FROM <format>.`s3://<path_to_files>` ",
      "WITH (CREDENTIAL <storage_credential)",
      sep = "")))
    
    display(spark.sql("SELECT * FROM <format>.`s3://<path_to_files>` " +
      "WITH (CREDENTIAL <storage_credential)"))
    

Create a table from the files

Note

You can instead migrate an existing external table in the Hive metastore to Unity Catalog without duplicating its data. See Upgrade an external table to Unity Catalog.

  1. Create a new table and populate it with records data files on your cloud tenant.

    Important

    • When you create a table using this method, the storage path is read only once, to prevent duplication of records. If you want to re-read the contents of the directory, you must drop and re-create the table. For an existing table, you can insert records from a storage path.

    • The bucket path where you create a table cannot also be used to read or write data files.

    • Only the files in the exact directory are read; the read is not recursive.

    • You must have the following permissions:

      • USE CATALOG on the parent catalog and USE SCHEMA on the schema.

      • CREATE TABLE on the parent schema.

      • READ FILES on the external location associated with the bucket path where the files are located, or directly on the storage credential if you are not using an external location.

      • If you are creating an external table, you need CREATE EXTERNAL TABLE on the bucket path where the table will be created.

    • To create a managed table and populate it with records from a bucket path:

      CREATE TABLE <catalog>.<schema>.<table_name>
      USING delta
      (
        <column_specification>
      )
      SELECT * from delta.`s3://<path_to_files>`;
      
      spark.sql("CREATE TABLE <catalog>.<schema>.<table_name> "
        "USING delta "
        "( "
        "  <column_specification> "
        ") "
        "SELECT * from delta.`s3://<path_to_files>`")
      
      library(SparkR)
      
      sql(paste("CREATE TABLE <catalog>.<schema>.<table_name> ",
        "USING delta ",
        "( ",
        "  <column_specification> ",
        ") ",
        "SELECT * from delta.`s3://<path_to_files>`",
        sep = ""))
      
      spark.sql("CREATE TABLE <catalog>.<schema>.<table_name> " +
        "USING delta " +
        "( " +
        "  <column_specification> " +
        ") " +
        "SELECT * from delta.`s3://<path_to_files>`")
      

      To use a storage credential, add WITH (CREDENTIAL <storage_credential>) to the command.

    • To create an external table and populate it with records from a bucket path, add a LOCATION clause:

      CREATE TABLE <catalog>.<schema>.<table_name>
      USING delta
      (
          <column_specification>
      )
      LOCATION 's3://<table_location>'
      SELECT * from <format>.`s3://<path_to_files>`;
      
      spark.sql("CREATE TABLE <catalog>.<schema>.<table_name> "
        "USING delta "
        "( "
        "  <column_specification> "
        ") "
        "LOCATION 's3://<table_location>' "
        "SELECT * from <format>.`s3://<path_to_files>`")
      
      library(SparkR)
      
      sql(paste("CREATE TABLE <catalog>.<schema>.<table_name> ",
        "USING delta ",
        "( ",
        "  <column_specification> ",
        ") ",
        "LOCATION 's3://<table_location>' ",
        "SELECT * from <format>.`s3://<path_to_files>`",
        sep = ""))
      
      spark.sql("CREATE TABLE <catalog>.<schema>.<table_name> " +
        "USING delta " +
        "( " +
        "  <column_specification> " +
        ") " +
        "LOCATION 's3://<table_location>' " +
        "SELECT * from <format>.`s3://<path_to_files>`")
      

      To use a storage credential directly, add WITH (CREDENTIAL <storage_credential>) to the command.

Insert records from a path into an existing table

To insert records from a bucket path into an existing table, use the COPY INTO command. In the following examples, replace the placeholder values:

  • <catalog>: The name of the table’s parent catalog.

  • <schema>: The name of the table’s parent schema.

  • <path_to_files>: The bucket path that contains the data files.

  • <format>: The format of the files.

  • <table_location>: The bucket path where the table will be created.

  • <storage_credential>: If you are using a storage credential directly, the name of the storage credential that authorizes reading from or writing to the bucket path.

Important

  • When you insert records into a table using this method, the bucket path you provide is read only once, to prevent duplication of records.

  • The bucket path where you create a table cannot also be used to read or write data files.

  • Only the files in the exact directory are read; the read is not recursive.

  • You must have the following permissions:

    • USE CATALOG on the parent catalog and USE SCHEMA on the schema.

    • MODIFY on the table.

    • READ FILES on the external location associated with the bucket path where the files are located, or directly on the storage credential if you are not using an external location.

    • To insert records into an external table, you need CREATE EXTERNAL TABLE on the bucket path where the table is located.

  • To insert records from files in a bucket path into a managed table, using an external location to read from the bucket path:

    COPY INTO <catalog>.<schema>.<table>
    FROM (
      SELECT *
      FROM 's3://<path_to_files>'
    )
    FILEFORMAT = <format>;
    
    spark.sql("COPY INTO <catalog>.<schema>.<table> "
      "FROM ( "
      "  SELECT * "
      "  FROM 's3://<path_to_files>' "
      ") "
      "FILEFORMAT = <format>")
    
    library(SparkR)
    
    sql(paste("COPY INTO <catalog>.<schema>.<table> ",
      "FROM ( ",
      "  SELECT * ",
      "  FROM 's3://<path_to_files>' ",
      ") ",
      "FILEFORMAT = <format>",
      sep = ""))
    
    spark.sql("COPY INTO <catalog>.<schema>.<table> " +
      "FROM ( " +
      "  SELECT * " +
      "  FROM 's3://<path_to_files>' " +
      ") " +
      "FILEFORMAT = <format>")
    

    To use a storage credential directly, add WITH (CREDENTIAL <storage_credential>) to the command.

  • To insert into an external table, add a LOCATION clause:

    COPY INTO <catalog>.<schema>.<table>
    LOCATION 's3://<table_location>'
    FROM (
      SELECT *
      FROM 's3://<path_to_files>'
    )
    FILEFORMAT = <format>;
    
    spark.sql("COPY INTO <catalog>.<schema>.<table> "
      "LOCATION 's3://<table_location>' "
      "FROM ( "
      "  SELECT * "
      "  FROM 's3://<path_to_files>' "
      ") "
      "FILEFORMAT = <format>")
    
    library(SparkR)
    
    sql(paste("COPY INTO <catalog>.<schema>.<table> ",
      "LOCATION 's3://<table_location>' ",
      "FROM ( ",
      "  SELECT * ",
      "  FROM 's3://<path_to_files>' ",
      ") ",
      "FILEFORMAT = <format>",
      sep = ""))
    
    spark.sql("COPY INTO <catalog>.<schema>.<table> " +
      "LOCATION 's3://<table_location>' " +
      "FROM ( " +
      "  SELECT * " +
      "  FROM 's3://<path_to_files>' " +
      ") " +
      "FILEFORMAT = <format>")
    

    To use a storage credential directly, add WITH (CREDENTIAL <storage_credential>) to the command.