CREATE TABLE USING (SQL Analytics)

Defines a table using a data source.

Syntax

CREATE TABLE [ IF NOT EXISTS ] table_identifier
    [ ( col_name1 col_type1 [ COMMENT col_comment1 ], ... ) ]
    [ USING data_source ]
    [ OPTIONS ( key1 [ = ] val1, key2 [ = ] val2, ... ) ]
    [ PARTITIONED BY ( col_name1, col_name2, ... ) ]
    [ LOCATION path ]
    [ COMMENT table_comment ]
    [ TBLPROPERTIES ( key1 [ = ] val1, key2 [ = ] val2, ... ) ]
    [ AS select_statement ]

The clauses between the USING clause and the AS SELECT clause can appear in any order. For example, you can write COMMENT table_comment after TBLPROPERTIES.

Parameters

  • table_identifier

    A table name, optionally qualified with a database name.

    Syntax: [ database_name. ] table_name

  • IF NOT EXISTS

    If a table with the same name already exists, nothing will happen. IF NOT EXISTS cannot coexist with OR REPLACE, which means CREATE TABLE OR REPLACE IF NOT EXISTS is not allowed.

  • USING data_source

    The file format to use for the table. data_source must be one of TEXT, CSV, JSON, JDBC, PARQUET, ORC, or DELTA. If USING is omitted, the default is DELTA.

    Important

    • When USING is omitted or data_source is DELTA, the additional options in Delta table options apply.
    • When data_source is TEXT, CSV, JSON, JDBC, PARQUET, or ORC you must specify LOCATION and AS select_statement is not allowed.
  • PARTITIONED BY

    Partitions the table by the specified columns.

  • LOCATION

    Path to the directory where table data is stored, which could be a path on distributed storage.

  • OPTIONS

    Table options to optimize the behavior of the table. This clause is not supported when the data source is DELTA.

  • COMMENT

    A string literal to describe the table.

  • TBLPROPERTIES

    A list of key-value pairs used to tag the table definition.

  • AS select_statement

    Populates the table using the data from select_statement.

Delta table options

In addition to the standard CREATE TABLE options, Delta tables support the options described in this section.

CREATE [ OR REPLACE ] ... table_identifier
  [ (col_name1 col_type1 [ NOT NULL ] [ COMMENT col_comment1], ...) ]
  [ USING DELTA ]
  [ LOCATION <path-to-delta-files> ]

Parameters

  • OR REPLACE

    If a table with the same name already exists, the table is replaced with the new configuration.

    Note

    Databricks strongly recommends using OR REPLACE instead of dropping and re-creating tables.

  • table_identifier

    • [database_name.] table_name: A table name, optionally qualified with a database name.
    • delta.`<path-to-table>` : The location of an existing Delta table.
  • NOT NULL

    Indicate that a column value cannot be NULL. The default is to allow a NULL value. If specified any change to the Delta table will check these NOT NULL constraints.

    For details, see NOT NULL constraint.

  • LOCATION <path-to-delta-files>

    If you specify a LOCATION that already contains data stored in Delta Lake, Delta Lake does the following:

    • If you specify only the table name and location, for example:

      CREATE TABLE events
        LOCATION '/mnt/delta/events'
      

      the table in the Hive metastore automatically inherits the schema, partitioning, and table properties of the existing data. This functionality can be used to “import” data into the metastore.

    • If you specify any configuration (schema, partitioning, or table properties), Delta Lake verifies that the specification exactly matches the configuration of the existing data.

    Warning

    If the specified configuration does not exactly match the configuration of the data, Delta Lake throws an exception that describes the discrepancy.

Data source interaction

A data source table acts like a pointer to the underlying data source. For example, you can create a table foo in Databricks that points to a file bar. When you read and write table foo, you actually read and write the file bar. If you don’t specify the LOCATION, Databricks creates a default table location.

For CREATE TABLE AS SELECT, Databricks overwrites the underlying data source with the data of the input query, to make sure the table gets created contains exactly the same data as the input query.

Examples

--Creates a Delta table
CREATE TABLE student (id INT, name STRING, age INT);
--Use data from another table
CREATE TABLE student_copy AS SELECT * FROM student;
--Creates a CSV table from an external directory
CREATE TABLE student USING CSV LOCATION '/mnt/csv_files';
--Specify table comment and properties
CREATE TABLE student (id INT, name STRING, age INT)
    COMMENT 'this is a comment'
    TBLPROPERTIES ('foo'='bar');
--Specify table comment and properties with different clauses order
CREATE TABLE student (id INT, name STRING, age INT)
    TBLPROPERTIES ('foo'='bar')
    COMMENT 'this is a comment';
--Create partitioned table
CREATE TABLE student (id INT, name STRING, age INT)
    PARTITIONED BY (age);