Create Table

Create Table Using

CREATE [OR REPLACE] TABLE [IF NOT EXISTS] [db_name.]table_name
  [(col_name1 col_type1 [COMMENT col_comment1], ...)]
  USING data_source
  [OPTIONS (key1=val1, key2=val2, ...)]
  [PARTITIONED BY (col_name1, col_name2, ...)]
  [CLUSTERED BY (col_name3, col_name4, ...) INTO num_buckets BUCKETS]
  [LOCATION path]
  [COMMENT table_comment]
  [TBLPROPERTIES (key1=val1, key2=val2, ...)]
  [AS select_statement]

Create a table using a data source. If a table with the same name already exists in the database, an exception is thrown.

OR REPLACE

If a table with the same name already exists, the table is replaced with the new configuration.

Note

This syntax is available in Databricks Runtime 7.0 and above. We strongly recommend using REPLACE instead of dropping and re-creating tables.

IF NOT EXISTS

If a table with the same name already exists in the database, nothing will happen.

USING data_source

The file format to use for the table. data_source must be one of TEXT, CSV, JSON, JDBC, PARQUET, ORC, HIVE, DELTA, or LIBSVM, or a fully-qualified class name of a custom implementation of org.apache.spark.sql.sources.DataSourceRegister.

Important

Since Databricks Runtime 3.0, HIVE is supported to create a Hive SerDe table. You can specify the Hive-specific file_format and row_format using the OPTIONS clause, which is a case-insensitive string map. The option keys are FILEFORMAT, INPUTFORMAT, OUTPUTFORMAT, SERDE, FIELDDELIM, ESCAPEDELIM, MAPKEYDELIM, and LINEDELIM.

OPTIONS

Table options used to optimize the behavior of the table or configure HIVE tables.

Note

This clause is not supported by Delta Lake.

PARTITIONED BY (col_name1, col_name2, ...)

Partition the created table by the specified columns. A directory is created for each partition.

CLUSTERED BY col_name3, col_name4, ...)

Each partition in the created table will be split into a fixed number of buckets by the specified columns. This is typically used with partitioning to read and shuffle less data.

LOCATION path

The directory to store the table data. This clause automatically implies EXTERNAL.

(Delta Lake on Databricks) When you specify a LOCATION that already contains data stored in Delta Lake, Delta Lake does the following:

  • If you specify only the table name and location, for example:

    CREATE TABLE events
      USING DELTA
      LOCATION '/mnt/delta/events'
    

    the table in the Hive metastore automatically inherits the schema, partitioning, and table properties of the existing data. This functionality can be used to “import” data into the metastore.

  • If you specify any configuration (schema, partitioning, or table properties), Delta Lake verifies that the specification exactly matches the configuration of the existing data.

Warning

If the specified configuration does not exactly match the configuration of the data, Delta Lake throws an exception that describes the discrepancy.

AS select_statement

Populate the table with input data from the SELECT statement. This cannot contain a column list.

Examples

CREATE TABLE boxes (width INT, length INT, height INT) USING CSV

CREATE TABLE boxes
  (width INT, length INT, height INT)
  USING PARQUET
  OPTIONS ('compression'='snappy')

CREATE TABLE rectangles
  USING PARQUET
  PARTITIONED BY (width)
  CLUSTERED BY (length) INTO 8 buckets
  AS SELECT * FROM boxes

-- CREATE a HIVE SerDe table using the CREATE TABLE USING syntax.
CREATE TABLE my_table (name STRING, age INT, hair_color STRING)
  USING HIVE
  OPTIONS(
      INPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileInputFormat',
      OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat',
      SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe')
  PARTITIONED BY (hair_color)
  TBLPROPERTIES ('status'='staging', 'owner'='andrew')

Create Table with Hive format

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
  [(col_name1[:] col_type1 [COMMENT col_comment1], ...)]
  [COMMENT table_comment]
  [PARTITIONED BY (col_name2[:] col_type2 [COMMENT col_comment2], ...)]
  [ROW FORMAT row_format]
  [STORED AS file_format]
  [LOCATION path]
  [TBLPROPERTIES (key1=val1, key2=val2, ...)]
  [AS select_statement]

row_format:
  : SERDE serde_cls [WITH SERDEPROPERTIES (key1=val1, key2=val2, ...)]
  | DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]]
      [COLLECTION ITEMS TERMINATED BY char]
      [MAP KEYS TERMINATED BY char]
      [LINES TERMINATED BY char]
      [NULL DEFINED AS char]

file_format:
  : TEXTFILE | SEQUENCEFILE | RCFILE | ORC | PARQUET | AVRO
  | INPUTFORMAT input_fmt OUTPUTFORMAT output_fmt

Create a table using the Hive format. If a table with the same name already exists in the database, an exception will be thrown. When the table is dropped later, its data will be deleted from the file system.

Note

This command is supported only when Hive support is enabled.

EXTERNAL

The table uses the custom directory specified with LOCATION. Queries on the table access existing data previously stored in the directory. When an EXTERNAL table is dropped, its data is not deleted from the file system. This flag is implied if LOCATION is specified.

IF NOT EXISTS

If a table with the same name already exists in the database, nothing will happen.

PARTITIONED BY (col_name2[:] col_type2 [COMMENT col_comment2], ...)

Partition the table by the specified columns. This set of columns must be distinct from the set of non-partitioned columns. You cannot specify partitioned columns with AS select_statement.

ROW FORMAT

Use the SERDE clause to specify a custom SerDe for this table. Otherwise, use the DELIMITED clause to use the native SerDe and specify the delimiter, escape character, null character, and so on.

STORED AS file_format

Specify the file format for this table. Available formats include TEXTFILE, SEQUENCEFILE, RCFILE, ORC, PARQUET, and AVRO. Alternatively, you can specify your own input and output formats through INPUTFORMAT and OUTPUTFORMAT. Only formats TEXTFILE, SEQUENCEFILE, and RCFILE can be used with ROW FORMAT SERDE and only TEXTFILE can be used with ROW FORMAT DELIMITED.

LOCATION path

The directory to store the table data. This clause automatically implies EXTERNAL.

AS select_statement

Populate the table with input data from the select statement. You cannot specify this with PARTITIONED BY.

Examples

CREATE TABLE my_table (name STRING, age INT)

CREATE TABLE my_table (name STRING, age INT)
  COMMENT 'This table is partitioned'
  PARTITIONED BY (hair_color STRING COMMENT 'This is a column comment')
  TBLPROPERTIES ('status'='staging', 'owner'='andrew')

CREATE TABLE my_table (name STRING, age INT)
  COMMENT 'This table specifies a custom SerDe'
  ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
  STORED AS
      INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
      OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'

CREATE TABLE my_table (name STRING, age INT)
  COMMENT 'This table uses the CSV format'
  ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
  STORED AS TEXTFILE

CREATE TABLE your_table
  COMMENT 'This table is created with existing data'
  AS SELECT * FROM my_table

CREATE EXTERNAL TABLE IF NOT EXISTS my_table (name STRING, age INT)
  COMMENT 'This table is created with existing data'
  LOCATION 'spark-warehouse/tables/my_existing_table'

Create Table Like

CREATE TABLE [IF NOT EXISTS] [db_name.]table_name1 LIKE [db_name.]table_name2 [LOCATION path]

Create a managed table using the definition/metadata of an existing table or view. The created table always uses its own directory in the default warehouse location. Since Databricks Runtime 3.0, you can specify LOCATION to create an EXTERNAL table.

Note

Delta Lake does not support CREATE TABLE LIKE. Instead use CREATE TABLE AS. See AS.