CREATE TABLE with Hive format

Applies to: check marked yes Databricks Runtime

Defines a table using Hive format.

Syntax

CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier
    [ ( col_name1[:] col_type1 [ COMMENT col_comment1 ], ... ) ]
    [ COMMENT table_comment ]
    [ PARTITIONED BY ( col_name2[:] col_type2 [ COMMENT col_comment2 ], ... )
        | ( col_name1, col_name2, ... ) ]
    [ ROW FORMAT row_format ]
    [ STORED AS file_format ]
    [ LOCATION path ]
    [ TBLPROPERTIES ( key1=val1, key2=val2, ... ) ]
    [ AS select_statement ]

row_format:
    : SERDE serde_class [ WITH SERDEPROPERTIES (k1=v1, k2=v2, ... ) ]
    | DELIMITED [ FIELDS TERMINATED BY fields_terminated_char [ ESCAPED BY escaped_char ] ]
        [ COLLECTION ITEMS TERMINATED BY collection_items_terminated_char ]
        [ MAP KEYS TERMINATED BY map_key_terminated_char ]
        [ LINES TERMINATED BY row_terminated_char ]
        [ NULL DEFINED AS null_char ]

The clauses between the column definition clause and the AS SELECT clause can appear in any order. For example, you can write COMMENT table_comment after TBLPROPERTIES.

Note

You must specify either the STORED AS or ROW FORMAT clause. Otherwise, the SQL parser uses the CREATE TABLE [USING] syntax to parse it and creates a Delta table by default.

Parameters

  • table_identifier

    A table name, optionally qualified with a schema name.

    Syntax: [schema_name.] table_name

  • EXTERNAL

    Defines the table using the path provided in LOCATION.

  • PARTITIONED BY

    Partitions the table by the specified columns.

  • ROW FORMAT

    Use the SERDE clause to specify a custom SerDe for one table. Otherwise, use the DELIMITED clause to use the native SerDe and specify the delimiter, escape character, null character and so on.

  • SERDE

    Specifies a custom SerDe for one table.

  • serde_class

    Specifies a fully-qualified class name of a custom SerDe.

  • SERDEPROPERTIES

    A list of key-value pairs used to tag the SerDe definition.

  • DELIMITED

    The DELIMITED clause can be used to specify the native SerDe and state the delimiter, escape character, null character and so on.

  • FIELDS TERMINATED BY

    Used to define a column separator.

  • COLLECTION ITEMS TERMINATED BY

    Used to define a collection item separator.

  • MAP KEYS TERMINATED BY

    Used to define a map key separator.

  • LINES TERMINATED BY

    Used to define a row separator.

  • NULL DEFINED AS

    Used to define the specific value for NULL.

  • ESCAPED BY

    Define the escape mechanism.

  • COLLECTION ITEMS TERMINATED BY

    Define a collection item separator.

  • MAP KEYS TERMINATED BY

    Define a map key separator.

  • LINES TERMINATED BY

    Define a row separator.

  • NULL DEFINED AS

    Define the specific value for NULL.

  • STORED AS

    The file format for the table. Available formats include TEXTFILE, SEQUENCEFILE, RCFILE, ORC, PARQUET, and AVRO. Alternatively, you can specify your own input and output formats through INPUTFORMAT and OUTPUTFORMAT. Only formats TEXTFILE, SEQUENCEFILE, and RCFILE can be used with ROW FORMAT SERDE and only TEXTFILE can be used with ROW FORMAT DELIMITED.

  • LOCATION

    Path to the directory where table data is stored, which could be a path on distributed storage.

  • COMMENT

    A string literal to describe the table.

  • TBLPROPERTIES

    A list of key-value pairs used to tag the table definition.

  • AS select_statement

    Populates the table using the data from the select statement.

Examples

--Use hive format
CREATE TABLE student (id INT, name STRING, age INT) STORED AS ORC;

--Use data from another table
CREATE TABLE student_copy STORED AS ORC
    AS SELECT * FROM student;

--Specify table comment and properties
CREATE TABLE student (id INT, name STRING, age INT)
    COMMENT 'this is a comment'
    STORED AS ORC
    TBLPROPERTIES ('foo'='bar');

--Specify table comment and properties with different clauses order
CREATE TABLE student (id INT, name STRING, age INT)
    STORED AS ORC
    TBLPROPERTIES ('foo'='bar')
    COMMENT 'this is a comment';

--Create partitioned table
CREATE TABLE student (id INT, name STRING)
    PARTITIONED BY (age INT)
    STORED AS ORC;

--Create partitioned table with different clauses order
CREATE TABLE student (id INT, name STRING)
    STORED AS ORC
    PARTITIONED BY (age INT);

--Use Row Format and file format
CREATE TABLE student (id INT, name STRING)
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
    STORED AS TEXTFILE;

--Use complex datatype
CREATE EXTERNAL TABLE family(
        name STRING,
        friends ARRAY<STRING>,
        children MAP<STRING, INT>,
        address STRUCT<street: STRING, city: STRING>
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ESCAPED BY '\\'
    COLLECTION ITEMS TERMINATED BY '_'
    MAP KEYS TERMINATED BY ':'
    LINES TERMINATED BY '\n'
    NULL DEFINED AS 'foonull'
    STORED AS TEXTFILE
    LOCATION '/tmp/family/';

--Use predefined custom SerDe
CREATE TABLE avroExample
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
    STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
        OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
    TBLPROPERTIES ('avro.schema.literal'='{ "namespace": "org.apache.hive",
        "name": "first_schema",
        "type": "record",
        "fields": [
                { "name":"string1", "type":"string" },
                { "name":"string2", "type":"string" }
            ] }');

--Use personalized custom SerDe(we may need to `ADD JAR xxx.jar` first to ensure we can find the serde_class,
--or you may run into `CLASSNOTFOUND` exception)
ADD JAR /tmp/hive_serde_example.jar;

CREATE EXTERNAL TABLE family (id INT, name STRING)
    ROW FORMAT SERDE 'com.ly.spark.serde.SerDeExample'
    STORED AS INPUTFORMAT 'com.ly.spark.example.serde.io.SerDeExampleInputFormat'
        OUTPUTFORMAT 'com.ly.spark.example.serde.io.SerDeExampleOutputFormat'
    LOCATION '/tmp/family/';