Hive table

This article shows how to import a Hive table from cloud storage into Databricks using an external table.

Step 1: Show the CREATE TABLE statement

Issue a SHOW CREATE TABLE <tablename> command on your Hive command line to see the statement that created the table.

hive> SHOW CREATE TABLE wikicc;
OK
CREATE  TABLE `wikicc`(
  `country` string,
  `count` int)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  '<path-to-table>'
TBLPROPERTIES (
  'totalSize'='2335',
  'numRows'='240',
  'rawDataSize'='2095',
  'COLUMN_STATS_ACCURATE'='true',
  'numFiles'='1',
  'transient_lastDdlTime'='1418173653')

Step 2: Issue a CREATE EXTERNAL TABLE statement

If the statement that is returned uses a CREATE TABLE command, copy the statement and replace CREATE TABLE with CREATE EXTERNAL TABLE.

  • EXTERNAL ensures that Spark SQL does not delete your data if you drop the table.

  • You can omit the TBLPROPERTIES field.

DROP TABLE wikicc
CREATE EXTERNAL TABLE `wikicc`(
  `country` string,
  `count` int)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  '<path-to-table>'

Step 3: Issue SQL commands on your data

SELECT * FROM wikicc