CREATE TABLE with Hive format
Applies to: Databricks Runtime
Defines a table using Hive format.
Syntax
CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier
[ ( col_name1[:] col_type1 [ COMMENT col_comment1 ], ... ) ]
[ COMMENT table_comment ]
[ PARTITIONED BY ( col_name2[:] col_type2 [ COMMENT col_comment2 ], ... )
| ( col_name1, col_name2, ... ) ]
[ ROW FORMAT row_format ]
[ STORED AS file_format ]
[ LOCATION path ]
[ TBLPROPERTIES ( key1=val1, key2=val2, ... ) ]
[ AS select_statement ]
row_format:
: SERDE serde_class [ WITH SERDEPROPERTIES (k1=v1, k2=v2, ... ) ]
| DELIMITED [ FIELDS TERMINATED BY fields_terminated_char [ ESCAPED BY escaped_char ] ]
[ COLLECTION ITEMS TERMINATED BY collection_items_terminated_char ]
[ MAP KEYS TERMINATED BY map_key_terminated_char ]
[ LINES TERMINATED BY row_terminated_char ]
[ NULL DEFINED AS null_char ]
The clauses between the column definition clause and the AS SELECT
clause can appear in any order. For example, you can write COMMENT table_comment
after TBLPROPERTIES
.
Note
You must specify either the STORED AS
or ROW FORMAT
clause. Otherwise, the SQL parser uses the CREATE TABLE [USING] syntax to parse it and creates a Delta table by default.
Parameters
table_identifier
A table name, optionally qualified with a schema name.
Syntax:
[schema_name.] table_name
EXTERNAL
Defines the table using the path provided in
LOCATION
.PARTITIONED BY
Partitions the table by the specified columns.
ROW FORMAT
Use the
SERDE
clause to specify a custom SerDe for one table. Otherwise, use theDELIMITED
clause to use the native SerDe and specify the delimiter, escape character, null character and so on.SERDE
Specifies a custom SerDe for one table.
serde_class
Specifies a fully-qualified class name of a custom SerDe.
SERDEPROPERTIES
A list of key-value pairs used to tag the SerDe definition.
DELIMITED
The
DELIMITED
clause can be used to specify the native SerDe and state the delimiter, escape character, null character and so on.FIELDS TERMINATED BY
Used to define a column separator.
COLLECTION ITEMS TERMINATED BY
Used to define a collection item separator.
MAP KEYS TERMINATED BY
Used to define a map key separator.
LINES TERMINATED BY
Used to define a row separator.
NULL DEFINED AS
Used to define the specific value for NULL.
ESCAPED BY
Define the escape mechanism.
COLLECTION ITEMS TERMINATED BY
Define a collection item separator.
MAP KEYS TERMINATED BY
Define a map key separator.
LINES TERMINATED BY
Define a row separator.
NULL DEFINED AS
Define the specific value for
NULL
.STORED AS
The file format for the table. Available formats include
TEXTFILE
,SEQUENCEFILE
,RCFILE
,ORC
,PARQUET
, andAVRO
. Alternatively, you can specify your own input and output formats throughINPUTFORMAT
andOUTPUTFORMAT
. Only formatsTEXTFILE
,SEQUENCEFILE
, andRCFILE
can be used withROW FORMAT SERDE
and onlyTEXTFILE
can be used withROW FORMAT DELIMITED
.LOCATION
Path to the directory where table data is stored, which could be a path on distributed storage.
COMMENT
A string literal to describe the table.
TBLPROPERTIES
A list of key-value pairs used to tag the table definition.
AS select_statement
Populates the table using the data from the select statement.
Examples
--Use hive format
CREATE TABLE student (id INT, name STRING, age INT) STORED AS ORC;
--Use data from another table
CREATE TABLE student_copy STORED AS ORC
AS SELECT * FROM student;
--Specify table comment and properties
CREATE TABLE student (id INT, name STRING, age INT)
COMMENT 'this is a comment'
STORED AS ORC
TBLPROPERTIES ('foo'='bar');
--Specify table comment and properties with different clauses order
CREATE TABLE student (id INT, name STRING, age INT)
STORED AS ORC
TBLPROPERTIES ('foo'='bar')
COMMENT 'this is a comment';
--Create partitioned table
CREATE TABLE student (id INT, name STRING)
PARTITIONED BY (age INT)
STORED AS ORC;
--Create partitioned table with different clauses order
CREATE TABLE student (id INT, name STRING)
STORED AS ORC
PARTITIONED BY (age INT);
--Use Row Format and file format
CREATE TABLE student (id INT, name STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
--Use complex datatype
CREATE EXTERNAL TABLE family(
name STRING,
friends ARRAY<STRING>,
children MAP<STRING, INT>,
address STRUCT<street: STRING, city: STRING>
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ESCAPED BY '\\'
COLLECTION ITEMS TERMINATED BY '_'
MAP KEYS TERMINATED BY ':'
LINES TERMINATED BY '\n'
NULL DEFINED AS 'foonull'
STORED AS TEXTFILE
LOCATION '/tmp/family/';
--Use predefined custom SerDe
CREATE TABLE avroExample
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES ('avro.schema.literal'='{ "namespace": "org.apache.hive",
"name": "first_schema",
"type": "record",
"fields": [
{ "name":"string1", "type":"string" },
{ "name":"string2", "type":"string" }
] }');
--Use personalized custom SerDe(we may need to `ADD JAR xxx.jar` first to ensure we can find the serde_class,
--or you may run into `CLASSNOTFOUND` exception)
ADD JAR /tmp/hive_serde_example.jar;
CREATE EXTERNAL TABLE family (id INT, name STRING)
ROW FORMAT SERDE 'com.ly.spark.serde.SerDeExample'
STORED AS INPUTFORMAT 'com.ly.spark.example.serde.io.SerDeExampleInputFormat'
OUTPUTFORMAT 'com.ly.spark.example.serde.io.SerDeExampleOutputFormat'
LOCATION '/tmp/family/';