CREATE TABLE [USING] (Databricks SQL)

Defines a managed or external table, optionally using a data source.

Syntax

{ { [CREATE OR] REPLACE TABLE | CREATE TABLE [ IF NOT EXISTS ] }
  table_name
  [ column_specification ] [ USING data_source ]
  [ table_clauses ]
  [ AS query ] }

column_sepcification
  ( { column_identifier column_type [ NOT NULL ]
      [ GENERATED ALWAYS AS ( expr ) ] [ COMMENT column_comment ] } [, ...] )

table_clauses
  { OPTIONS ( { option_key [ = ] option_val } [, ...] ) |
    PARTITIONED BY clause |
    clustered_by_clause |
    LOCATION path |
    COMMENT table_comment |
    TBLPROPERTIES ( { property_key [ = ] property_val } [, ...] ) } [...]

clustered_by_clause
  { CLUSTERED BY ( cluster_column [, ...] )
    [ SORTED BY ( { sort_column [ ASC | DESC ] } [, ...] ) ]
    INTO num_buckets BUCKETS }

Parameters

  • REPLACE

    If specified replaces the table and its content if it already exists. This clause is only supported for Delta Lake tables.

    Note

    Databricks strongly recommends using REPLACE instead of dropping and re-creating Delta Lake tables.

  • IF NOT EXISTS

    If specified and a table with the same name already exists, the statement is ignored.

    IF NOT EXISTS cannot coexist with REPLACE, which means CREATE OR REPLACE TABLE IF NOT EXISTS is not allowed.

  • table_name

    The name of the table to be created. The name must not include a temporal specification. If the name is not qualified the table is created in the current database.

  • column_specification

    This optional clause defines the list of columns, their types, properties, and descriptions.

    If you do not define columns the table schema you must specify either AS query or LOCATION.

    • column_identifier

      A unique name for the column.

    • column_type

      Specifies the data type of the column. Not all data types supported by Databricks are supported by all data sources.

    • NOT NULL

      If specified the column will not accept NULL values. This clause is only supported for Delta Lake tables.

    • GENERATED ALWAYS AS ( expr )

      When you specify this clause the value of this column is determined by the specified expr.

      expr may be composed of literals, column identifiers within the table, and deterministic, built-in SQL functions or operators except:

      Also expr must not contain any subquery.

    • COMMENT column_comment

      A string literal to describe the table.

  • USING data_source

    The file format to use for the table. data_source must be one of TEXT, AVRO, CSV, JSON, JDBC, PARQUET, ORC, or DELTA. If USING is omitted, the default is DELTA.

    In Databricks SQL all tables other than Delta Lake tables are read only.

  • table_clauses

    Optionally specify location, partitioning, clustering, options, comments, and user defined properties for the new table. Each sub clause may only be specified once.

    • PARTITIONED BY

      An optional clause to partition the table by a subset of columns.

      Note

      Unless you define a Delta Lake table partitioning columns referencing the columns in the column specification are always moved to the end of the table.

    • clustered_by_clause

      Optionally cluster the table or each partition into a fixed number of hash buckets using a subset of the columns.

      Clustering is not supported for Delta Lake tables.

      • CLUSTERED BY

        Specifies the set of columns by which to cluster each partition, or the table if no partitioning is specified.

        • cluster_column

          An identifier referencing a column_identifier in the table. If you specify more than one column there must be no duplicates. Since a clustering operates on the partition level you must not name a partition column also as a cluster column.

      • SORTED BY

        Optionally maintains a sort order for rows in a bucket.

        • sort_column

          A column to sort the bucket by. The column must not be partition column. Sort columns must be unique.

        • ASC or DESC

          Optionally specifies whether sort_column is sorted in ascending (ASC) or descending (DESC) order. The default values is ASC.

      • INTO num_buckets BUCKETS

        An INTEGER literal specifying the number of buckets into which each partition (or the table if no partitioning is specified) is devided.

    • LOCATION path

      An optional path to the directory where table data is stored, which could be a path on distributed storage. path must be a STRING literal. If you specify no location the table is considered a managed table and Databricks creates a default table location. Specifying a location makes the table an external table.

      For a Delta Lake table the table configuration is inherited from the LOCATION if data is present. Therefore if any TBLPROPERTIES, column_specification, or PARTITION BY clauses are specified for Delta Lake tables they must exactly match the Delta Lake location data.

    • OPTIONS

      Table options to control the behavior of the table. This clause is not supported when the data source is DELTA.

      • option_key

        The key can consist of one or more identifiers separated by a dot, or a string literal.

        Option keys must be unique.

      • option_val

        The value for the option. The value must be a BOOLEAN, STRING, INTEGER, or DECIMAL literal.

    • COMMENT table_comment

      A string literal to describe the table.

    • TBLPROPERTIES

      Sets one or more user defined properties of an existing view. Properties are key value pairs.

      • property_key

        The key can consist of one or more identifiers separated by a dot, or a string literal.

        Property keys must be unique.

      • property_val

        The value for the property. The value must be a BOOLEAN, STRING, INTEGER, or DECIMAL literal.

  • AS query

    This optional clause populates the table using the data from query. When you specify a query you must not also specify a column_specification. The table schema will be derived form the query.

    Note that Databricks overwrites the underlying data source with the data of the input query, to make sure the table gets created contains exactly the same data as the input query.

Examples

-- Creates a Delta table
> CREATE TABLE student (id INT, name STRING, age INT);

-- Use data from another table
> CREATE TABLE student_copy AS SELECT * FROM student;

-- Creates a CSV table from an external directory
> CREATE TABLE student USING CSV LOCATION '/mnt/csv_files';

-- Specify table comment and properties
> CREATE TABLE student (id INT, name STRING, age INT)
    COMMENT 'this is a comment'
    TBLPROPERTIES ('foo'='bar');

-- Specify table comment and properties with different clauses order
> CREATE TABLE student (id INT, name STRING, age INT)
    TBLPROPERTIES ('foo'='bar')
    COMMENT 'this is a comment';

-- Create partitioned table
> CREATE TABLE student (id INT, name STRING, age INT)
    PARTITIONED BY (age);

-- Create a table with a generated column
> CREATE TABLE rectangles(a INT, b INT,
                          area INT GENERATED ALWAYS AS (a * b));