Create Table

CREATE [TEMPORARY] TABLE [IF NOT EXISTS] [db_name.]table_name
    [(col_name1 col_type1 [COMMENT col_comment1], ...)]
    USING datasource
    [OPTIONS (key1=val1, key2=val2, ...)]
    [PARTITIONED BY (col_name1, col_name2, ...)]
    [CLUSTERED BY (col_name3, col_name4, ...) INTO num_buckets BUCKETS]
    [LOCATION path]
    [COMMENT table_comment]
    [TBLPROPERTIES (key1=val1, key2=val2, ...)]
    [AS select_statement]

Create a table using a data source. If a table with the same name already exists in the database, an exception is thrown.

TEMPORARY
The created table will be available only in this session and will not be persisted to the underlying metastore, if any. This may not be specified with IF NOT EXISTS or AS <select_statement>. To use AS <select_statement> with TEMPORARY, one option is to create a TEMPORARY VIEW instead.
IF NOT EXISTS
If a table with the same name already exists in the database, nothing will happen. This may not be specified when creating a temporary table.
USING <data source>

The file format to use for the table. One of TEXT, CSV, JSON, JDBC, PARQUET, ORC, HIVE, DELTA, and LIBSVM, or a fully-qualified class name of a custom implementation of org.apache.spark.sql.sources.DataSourceRegister.

Important

Since Databricks Runtime 3.0, HIVE is also supported in USING <data source> to create a Hive SerDe table. The Hive-specific file_format and row_format can be specified using OPTIONS clause, which is a case-insensitive string map. The option keys are FILEFORMAT, INPUTFORMAT, OUTPUTFORMAT, SERDE, FIELDDELIM, ESCAPEDELIM, MAPKEYDELIM, and LINEDELIM.

PARTITIONED BY
The created table will be partitioned by the specified columns. A directory will be created for each partition.
CLUSTERED BY
Each partition in the created table will be split into a fixed number of buckets by the specified columns. This is typically used with partitioning to read and shuffle less data. Support for SORTED BY will be added in a future version.
LOCATION

The created table will use the specified directory to store its data. This clause automatically implies EXTERNAL.

(Databricks Delta) When you specify a LOCATION that already contains data stored using Databricks Delta, Databricks Delta does the following:

  • If you specify only the table name and location, for example:

    CREATE TABLE events
    USING DELTA
    LOCATION '/delta/events'
    

    the table in the Hive metastore automatically inherits the schema, partitioning, and table properties of the existing data. This functionality can be used to “import” data into the metastore.

  • If you specify any configuration (schema, partitioning, or table properties), Databricks Delta verifies that the specification exactly matches the configuration of the existing data.

Warning

If the specified configuration does not exactly match the configuration of the data, Databricks Delta throws an exception that describes the discrepancy.

AS <select_statement>
Populate the table with input data from the select statement. This may not be specified with TEMPORARY TABLE or with a column list. To specify it with TEMPORARY, use CREATE TEMPORARY VIEW instead.

Examples:

CREATE TABLE boxes (width INT, length INT, height INT) USING CSV

CREATE TEMPORARY TABLE boxes
    (width INT, length INT, height INT)
    USING PARQUET
    OPTIONS ('compression'='snappy')

CREATE TABLE rectangles
    USING PARQUET
    PARTITIONED BY (width)
    CLUSTERED BY (length) INTO 8 buckets
    AS SELECT * FROM boxes

-- CREATE a HIVE SerDe table using the CREATE TABLE USING syntax.
CREATE TABLE my_table (name STRING, age INT, hair_color STRING)
    USING HIVE
    OPTIONS(
        INPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileInputFormat',
        OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat',
        SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe')
    PARTITIONED BY (hair_color)
    TBLPROPERTIES ('status'='staging', 'owner'='andrew')

Create Table with Hive format

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
    [(col_name1[:] col_type1 [COMMENT col_comment1], ...)]
    [COMMENT table_comment]
    [PARTITIONED BY (col_name2[:] col_type2 [COMMENT col_comment2], ...)]
    [ROW FORMAT row_format]
    [STORED AS file_format]
    [LOCATION path]
    [TBLPROPERTIES (key1=val1, key2=val2, ...)]
    [AS select_statement]

row_format:
    : SERDE serde_cls [WITH SERDEPROPERTIES (key1=val1, key2=val2, ...)]
    | DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]]
        [COLLECTION ITEMS TERMINATED BY char]
        [MAP KEYS TERMINATED BY char]
        [LINES TERMINATED BY char]
        [NULL DEFINED AS char]

file_format:
    : TEXTFILE | SEQUENCEFILE | RCFILE | ORC | PARQUET | AVRO
    | INPUTFORMAT input_fmt OUTPUTFORMAT output_fmt

Create a table using the Hive format. If a table with the same name already exists in the database, an exception will be thrown. When the table is dropped later, its data will be deleted from the file system.

Note

This command is supported only when Hive support is enabled.

EXTERNAL
The table uses the custom directory specified with LOCATION. Queries on the table access existing data previously stored in the directory. When an EXTERNAL table is dropped, its data is not deleted from the file system. This flag is implied if LOCATION is specified.
IF NOT EXISTS
If a table with the same name already exists in the database, nothing will happen.
PARTITIONED BY
Partition the table by the specified columns. This set of columns must be distinct from the set of non-partitioned columns. Partitioned columns cannot be specified with AS <select_statement>.
ROW FORMAT
Use the SERDE clause to specify a custom SerDe for this table. Otherwise, use the DELIMITED clause to use the native SerDe and specify the delimiter, escape character, null character, and so on.
STORED AS
Specify the file format for this table. Available formats include TEXTFILE, SEQUENCEFILE, RCFILE, ORC, PARQUET, and AVRO. Alternatively, you can specify your own input and output formats through INPUTFORMAT and OUTPUTFORMAT. Only formats TEXTFILE, SEQUENCEFILE, and RCFILE can be used with ROW FORMAT SERDE and only TEXTFILE can be used with ROW FORMAT DELIMITED.
LOCATION
Use the specified directory to store the table data. This clause automatically implies EXTERNAL.
AS <select_statement>
Populate the table with input data from the select statement. This may not be specified with PARTITIONED BY.

Examples:

CREATE TABLE my_table (name STRING, age INT)

CREATE TABLE my_table (name STRING, age INT)
    COMMENT 'This table is partitioned'
    PARTITIONED BY (hair_color STRING COMMENT 'This is a column comment')
    TBLPROPERTIES ('status'='staging', 'owner'='andrew')

CREATE TABLE my_table (name STRING, age INT)
    COMMENT 'This table specifies a custom SerDe'
    ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
    STORED AS
        INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
        OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'

CREATE TABLE my_table (name STRING, age INT)
    COMMENT 'This table uses the CSV format'
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
    STORED AS TEXTFILE

CREATE TABLE your_table
    COMMENT 'This table is created with existing data'
    AS SELECT * FROM my_table


CREATE EXTERNAL TABLE IF NOT EXISTS my_table (name STRING, age INT)
    COMMENT 'This table is created with existing data'
    LOCATION 'spark-warehouse/tables/my_existing_table'

Create Table Like

CREATE TABLE [IF NOT EXISTS] [db_name.]table_name1 LIKE [db_name.]table_name2 [LOCATION path]

Create a managed table using the definition/metadata of an existing table or view. The created table always uses its own directory in the default warehouse location. Since Databricks Runtime 3.0, you can specify LOCATION to create an EXTERNAL table.