Databases and tables
A Databricks database (schema) is a collection of tables. A Databricks table is a collection of structured data. You can cache, filter, and perform any operations supported by Apache Spark DataFrames on Databricks tables. You can query tables with Spark APIs and Spark SQL.
Types of tables
There are two types of tables: global and local. A global table is available across all clusters. Databricks registers global tables either to the Databricks Hive metastore or to an external Hive metastore. For details about Hive support, see Apache Hive compatibility. A local table is not accessible from other clusters and is not registered in the Hive metastore. This is also known as a temporary view.
View databases and tables
Click Data in the sidebar. Databricks selects a running cluster to which you have access. The Databases folder displays the list of databases with the
default
database selected. The Tables folder displays the list of tables in the default
database.

You can change the cluster from the Databases menu, create table UI, or view table UI. For example, from the Databases menu:
Click the
at the top of the Databases folder.
Select a cluster.
Create a database
To create a database in SQL:
CREATE DATABASE <database-name> ...
For more options, see
Databricks Runtime 7.x and above: CREATE DATABASE
Databricks Runtime 5.5 LTS and 6.x: Create Database
Create a table
You can create a table with Create in the sidebar, the Create Table UI, or programmatically. You can populate a table from files in DBFS or data stored in any of the supported data sources.
Note
When you create a table using the UI, you cannot:
Upload a file if you are using a High Concurrency cluster. Instead, use the Databricks File System (DBFS) to load your data into Databricks.
Update the table. Instead, create a table programmatically.
Create a table using
Create in the sidebar
Using the Create icon in the sidebar, you can only create global tables. To create a local table, see Create a table programmatically.
Click
Create in the sidebar and select Table from the menu. The Create New Table dialog appears.
Follow the steps in Create a table using the UI, beginning at Step 4.
Create a table using the UI
With the UI, you can only create global tables. To create a local table, see Create a table programmatically.
Click
Data in the sidebar. The Databases and Tables folders display.
In the Databases folder, select a database.
Above the Tables folder, click Create Table.
Choose a data source and follow the steps in the corresponding section to configure the table.
If a Databricks administrator has disabled the Upload File option, you do not have the option to upload files; you can create tables using one of the other data sources.
Instructions for
Drag files to the Files dropzone or click the dropzone to browse and choose files. After upload, a path displays for each file. The path will be something like
/FileStore/tables/<filename>-<integer>.<file-type>
. You can use this path in a notebook to read data.Click Create Table with UI.
In the Cluster drop-down, choose a cluster.
Instructions for
Click Create Table with UI.
In the Cluster drop-down, choose a cluster.
Enter a bucket name. The S3 bucket must be accessible from the cluster you selected.
Click Browse Bucket.
Select a file.
Instructions for
Select a file.
Click Create Table with UI.
In the Cluster drop-down, choose a cluster.
Click Preview Table to view the table.
In the Table Name field, optionally override the default table name. A table name can contain only lowercase alphanumeric characters and underscores and must start with a lowercase letter or underscore.
In the Create in Database field, optionally override the selected
default
database.In the File Type field, optionally override the inferred file type.
If the file type is CSV:
In the Column Delimiter field, select whether to override the inferred delimiter.
Indicate whether to use the first row as the column titles.
Indicate whether to infer the schema.
If the file type is JSON, indicate whether the file is multi-line.
Click Create Table.
Create a table in a notebook
In the Create New Table UI you can use quickstart notebooks provided by Databricks to connect to any data source.
S3: Click Create Table in Notebook. The S3 bucket must be accessible from the cluster to which the notebook is attached.
DBFS: Click Create Table in Notebook.
Other Data Sources: In the Connector drop-down, select a data source type. Then click Create Table in Notebook.
Create a table programmatically
This section describes how to create global and local tables programmatically.
Create a global table
To create a global table in SQL:
CREATE TABLE <table-name> ...
For more options, see Create Table for Databricks Runtime 5.5 LTS and Databricks Runtime 6.4, or CREATE TABLE for Databricks Runtime 7.1 and above.
To create a global table from a DataFrame in Python or Scala:
dataFrame.write.saveAsTable("<table-name>")
Create a local table
To create a local table from a DataFrame in Python or Scala:
dataFrame.createOrReplaceTempView("<table-name>")
Here is an example that creates a local table called diamonds
from a file in Databricks File System (DBFS):
dataFrame = "/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv"
spark.read.format("csv").option("header","true")\
.option("inferSchema", "true").load(dataFrame)\
.createOrReplaceTempView("diamonds")
Access a table
You can view table details, read, update, and delete a table.
View table details
The table details view shows the table schema and sample data.
Click
Data in the sidebar.
In the Databases folder, click a database.
In the Tables folder, click the table name.
In the Cluster drop-down, optionally select another cluster to render the table preview.
Note
To display the table preview, a Spark SQL query runs on the cluster selected in the Cluster drop-down. If the cluster already has a workload running on it, the table preview may take longer to load.
Query a table
These examples show you how to query and display a table called diamonds
.
SELECT * FROM diamonds
diamonds = spark.sql("select * from diamonds")
display(diamonds.select("*"))
diamonds = spark.table("diamonds")
display(diamonds.select("*"))
diamonds <- sql(sqlContext, "select * from diamonds")
display(diamonds)
diamonds <- table(sqlContext, "diamonds")
display(diamonds)
val diamonds = spark.sql("select * from diamonds")
display(diamonds.select("*"))
val diamonds = spark.table("diamonds")
display(diamonds.select("*"))
Update a table
The table schema is immutable. However, you can update table data by changing the underlying files.
For example, for tables created from an S3 directory, adding or removing files in that directory changes the contents of the table.
After updating the files underlying a table, refresh the table using the following command:
REFRESH TABLE <table-name>
This ensures that when you access the table, Spark SQL reads the correct files even if the underlying files change.
Delete a table
Managed and unmanaged tables
Every Spark SQL table has metadata information that stores the schema and the data itself.
A managed table is a Spark SQL table for which Spark manages both the data and the metadata. In the case of managed table, Databricks stores the metadata and data in DBFS in your account. Since Spark SQL manages the tables, doing a DROP TABLE example_data
deletes both the metadata and data.
Some common ways of creating a managed table are:
CREATE TABLE <example-table>(id STRING, value STRING)
dataframe.write.saveAsTable("<example-table>")
Another option is to let Spark SQL manage the metadata, while you control the data location. We refer to this as an unmanaged table.
Spark SQL manages the relevant metadata, so when you perform DROP TABLE <example-table>
, Spark removes only the metadata and not the data itself. The data is still present in the path you provided.
You can create an unmanaged table with your data in data sources such as Cassandra, JDBC table, and so on. See Data sources for more information about the data sources supported by Databricks. Some common ways of creating an unmanaged table are:
CREATE TABLE <example-table>(id STRING, value STRING) USING org.apache.spark.sql.parquet OPTIONS (PATH "<your-storage-path>")
dataframe.write.option('path', "<your-storage-path>").saveAsTable("<example-table>")
Replace table contents
There are two main approaches to replacing table contents: simple and recommended.
Simple way to replace table contents
The simplest way replace table contents is to delete the table metadata and data and create another table.
Managed table
DROP TABLE IF EXISTS <example-table> // deletes the metadata and data
CREATE TABLE <example-table> AS SELECT ...
Unmanaged table
DROP TABLE IF EXISTS <example-table> // deletes the metadata
dbutils.fs.rm("<your-s3-path>", true) // deletes the data
CREATE TABLE <example-table> USING org.apache.spark.sql.parquet OPTIONS (PATH "<your-s3-path>") AS SELECT ...
Warning
Although the above approach is valid, since all data is on S3, you might run into S3 eventual consistency issues if you try to delete and immediately try to recreate it in the same location. You can read more about consistency issues in the blog S3mper: Consistency in the Cloud.
An alternative is to:
Create the table using the SQL DDL:
CREATE TABLE <table-name> (id long, date string) USING PARQUET LOCATION "<storage-location>"
Store new data in
<storage-location>
.Run
refresh table <table-name>
.
Recommended way to replace table contents
To avoid potential consistency issues, the best approach to replacing table contents is to overwrite the table.
dataframe.write.mode("overwrite").saveAsTable("<example-table>") // Managed Overwrite
dataframe.write.mode("overwrite").option("path","<your-s3-path>").saveAsTable("<example-table>") // Unmanaged Overwrite
Use the insert overwrite
keyword. This method applies to managed and unmanaged tables. For example, for an unmanaged table:
CREATE TABLE <example-table>(id STRING, value STRING) USING org.apache.spark.sql.parquet OPTIONS (PATH "<your-s3-path>")
INSERT OVERWRITE TABLE <example-table> SELECT ...
dataframe.write.mode(SaveMode.Overwrite).saveAsTable("<example-table>") // Managed Overwrite
dataframe.write.mode(SaveMode.Overwrite).option("path", "<your-s3-path>").saveAsTable("<example-table>") // Unmanaged Overwrite
Partitioned tables
Spark SQL is able to generate partitions dynamically at the file storage level to provide partition columns for tables.
Create a partitioned table
These examples partition data that you write. Spark SQL discovers the partitions and registers them in the Hive metastore.
// Create managed table as select
dataframe.write.mode(SaveMode.Overwrite).partitionBy("id").saveAsTable("<example-table>")
// Create unmanaged/external table as select
dataframe.write.mode(SaveMode.Overwrite).option("path", "<file-path>").saveAsTable("<example-table>")
However, if you create a partitioned table from existing data, Spark SQL does not automatically discover the partitions and register them in the Hive metastore. In this case, SELECT * FROM <example-table>
does not return results. To register the partitions, run the following to generate the partitions: MSCK REPAIR TABLE "<example-table>"
.
// Save data to external files
dataframe.write.mode(SaveMode.Overwrite).partitionBy("id").parquet("<file-path>")
// Create unmanaged/external table
spark.sql("CREATE TABLE <example-table>(id STRING, value STRING) USING parquet PARTITIONED BY(id) LOCATION "<file-path>"")
spark.sql("MSCK REPAIR TABLE "<example-table>"")
Partition pruning
When the table is scanned, Spark pushes down the filter predicates involving the partitionBy
keys. In that case, Spark avoids reading data that doesn’t satisfy those predicates. For example, suppose you have a table <example-data>
that is partitioned by <date>
. A query such as SELECT max(id) FROM <example-data> WHERE date = '2010-10-10'
reads only the data files containing tuples whose date
value matches the one specified in the query.
Table access control
Table access control allow admins and users to give fine-grained access to other users. See Data object privileges for details.