CREATE TABLE USING (SQL Analytics)
Defines a table using a data source.
Syntax
CREATE TABLE [ IF NOT EXISTS ] table_identifier
[ ( col_name1 col_type1 [ COMMENT col_comment1 ], ... ) ]
[ USING data_source ]
[ OPTIONS ( key1=val1, key2=val2, ... ) ]
[ PARTITIONED BY ( col_name1, col_name2, ... ) ]
[ LOCATION path ]
[ COMMENT table_comment ]
[ TBLPROPERTIES ( key1=val1, key2=val2, ... ) ]
[ AS select_statement ]
The clauses between the USING
clause and the AS SELECT
clause can appear in
any order. For example, you can write COMMENT table_comment
after TBLPROPERTIES
.
Parameters
table_identifier
A table name, optionally qualified with a database name.
Syntax:
[ database_name. ] table_name
IF NOT EXISTS
If a table with the same name already exists, nothing will happen.
IF NOT EXISTS
cannot coexist withOR REPLACE
, which meansCREATE TABLE OR REPLACE IF NOT EXISTS
is not allowed.USING data_source
The file format to use for the table.
data_source
must be one ofTEXT
,CSV
,JSON
,JDBC
,PARQUET
,ORC
, orDELTA
. IfUSING
is omitted, the default isDELTA
.Important
- When
USING
is omitted ordata_source
isDELTA
, the additional options in Delta table options apply. - When
data_source
isTEXT
,CSV
,JSON
,JDBC
,PARQUET
, orORC
you must specifyLOCATION
andAS select_statement
is not allowed.
- When
PARTITIONED BY
Partitions the table by the specified columns.
LOCATION
Path to the directory where table data is stored, which could be a path on distributed storage.
OPTIONS
Table options used to optimize the behavior of the table. This clause is not supported when the data source is
DELTA
.COMMENT
A string literal to describe the table.
TBLPROPERTIES
A list of key-value pairs used to tag the table definition.
AS select_statement
Populates the table using the data from
select_statement
.
Delta table options
In addition to the standard CREATE TABLE
options, Delta tables support the options described in this section.
CREATE [ OR REPLACE ] ... table_identifier
[ (col_name1 col_type1 [ NOT NULL ] [ COMMENT col_comment1], ...) ]
[ USING DELTA ]
[ LOCATION <path-to-delta-files> ]
Parameters
OR REPLACE
If a table with the same name already exists, the table is replaced with the new configuration.
Note
Databricks strongly recommends using
OR REPLACE
instead of dropping and re-creating tables.table_identifier
[database_name.] table_name
: A table name, optionally qualified with a database name.delta.`<path-to-table>`
: The location of an existing Delta table.
NOT NULL
Indicate that a column value cannot be
NULL
. The default is to allow aNULL
value. If specified any change to the Delta table will check theseNOT NULL
constraints.For details, see NOT NULL constraint.
LOCATION <path-to-delta-files>
If you specify a
LOCATION
that already contains data stored in Delta Lake, Delta Lake does the following:If you specify only the table name and location, for example:
CREATE TABLE events LOCATION '/mnt/delta/events'
the table in the Hive metastore automatically inherits the schema, partitioning, and table properties of the existing data. This functionality can be used to “import” data into the metastore.
If you specify any configuration (schema, partitioning, or table properties), Delta Lake verifies that the specification exactly matches the configuration of the existing data.
Warning
If the specified configuration does not exactly match the configuration of the data, Delta Lake throws an exception that describes the discrepancy.
Data source interaction
A data source table acts like a pointer to the underlying data source. For example, you can create
a table foo
in Databricks that points to a file bar
. When you read and write table foo
, you actually read and write the file bar
. If you don’t specify the LOCATION
, Databricks creates a default table location.
For CREATE TABLE AS SELECT
, Databricks overwrites the underlying data source with the data of the
input query, to make sure the table gets created contains exactly the same data as the input query.
Examples
--Creates a Delta table
CREATE TABLE student (id INT, name STRING, age INT);
--Use data from another table
CREATE TABLE student_copy AS SELECT * FROM student;
--Creates a CSV table from an external directory
CREATE TABLE student USING CSV LOCATION '/mnt/csv_files';
--Specify table comment and properties
CREATE TABLE student (id INT, name STRING, age INT)
COMMENT 'this is a comment'
TBLPROPERTIES ('foo'='bar');
--Specify table comment and properties with different clauses order
CREATE TABLE student (id INT, name STRING, age INT)
TBLPROPERTIES ('foo'='bar')
COMMENT 'this is a comment';
--Create partitioned table
CREATE TABLE student (id INT, name STRING, age INT)
PARTITIONED BY (age);