Databases and tables

A Databricks database is a collection of tables. A Databricks table is a collection of structured data. You can cache, filter, and perform any operations supported by Apache Spark DataFrames on Databricks tables. You can query tables with Spark APIs and Spark SQL.

There are two types of tables: global and local. A global table is available across all clusters. Databricks registers global tables either to the Databricks Hive metastore or to an external Hive metastore. For details about Hive support, see Apache Hive compatibility. A local table is not accessible from other clusters and is not registered in the Hive metastore. This is also known as a temporary view.

You can create a table using the Create Table UI or programmatically. A table can be populated from files in DBFS or data stored in any of the supported data sources.


To view and create databases and tables, you must have a running cluster.

View databases and tables

Click Data Icon in the sidebar. Databricks selects a running cluster to which you have access. The Databases folder displays the list of databases with the default database selected. The Tables folder displays the list of tables in the default database.

Create tables list

You can change the cluster from the Databases menu, create table UI, or view table UI. For example, from the Databases menu:

  1. Click the Down Caret at the top of the Databases folder.

  2. Select a cluster.

    Select cluster

Create a database

To create a database in SQL:

CREATE DATABASE <database-name> ...

For more options, see

Create a table

You can create a table using the UI or programmatically.

Create a table using the UI


When you create a table using the UI, you cannot

When you create a table using the UI, you create a global table.

  1. Click Data Icon in the sidebar. The Databases and Tables folders display.

  2. In the Databases folder, select a database.

  3. Above the Tables folder, click Add Data.

    Add Table Icon

  4. Choose a data source and follow the steps to configure the table.

    The Upload File option is enabled by default. If a Databricks administrator has disabled this feature, you will not have the option to upload files but can create tables using files in another data source.

    Configure table

    Upload File

    1. Drag files to the File dropzone or click the dropzone to browse to and choose files. After upload, a path displays for each file. The path will be something like /FileStore/tables/<filename>-<random-number>.<file-type> and you use this path in a notebook to read data.

      File dropzone

      1. Click Create Table with UI.
      2. In the Cluster drop-down, choose a cluster.


      1. Click Create Table with UI.
      2. In the Cluster drop-down, choose a cluster.
      3. Enter a bucket name. The S3 bucket must be accessible from the cluster you selected.
      4. Click Browse Bucket.
      5. Select a file.


      1. Select a file.
      2. Click Create Table with UI.
      3. In the Cluster drop-down, choose a cluster.
  5. Click Preview Table to view the table.

  6. In the Table Name field, optionally override the default table name. A table name can contain only lowercase alphanumeric characters and underscores and must start with a lowercase letter or underscore.

  7. In the Create in Database field, optionally override the selected default database.

  8. In the File Type field, optionally override the inferred file type.

  9. If the file type is CSV:

    1. In the Column Delimiter field, select whether to override the inferred delimiter.
    2. Indicate whether to use the first row as the column titles.
    3. Indicate whether to infer the schema.
  10. If the file type is JSON, indicate whether the file is multi-line.

  11. Click Create Table.

Create a table in a notebook

In the Create New Table UI you can use quickstart notebooks provided by Databricks to connect to any data source.

  • S3: Click Create Table in Notebook. The S3 bucket must be accessible from the cluster to which the notebook is attached.
  • DBFS: Click Create Table in Notebook.
  • Other Data Sources: In the Connector drop-down, select a data source type. Then click Create Table in Notebook.

Create a table programmatically

This section describes how to create global and local tables programmatically.

Create a global table

To create a global table in SQL:

CREATE TABLE <table-name> ...

For more options, see Create Table for Databricks Runtime 5.5 LTS and Databricks Runtime 6.4, or CREATE TABLE for Databricks Runtime 7.1 and above.

To create a global table from a DataFrame in Python or Scala:


Create a local table

To create a local table from a DataFrame in Python or Scala:


Here is an example that creates a local table called diamonds from a file in Databricks File System (DBFS):

dataFrame = "/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv""csv").option("header","true")\
  .option("inferSchema", "true").load(dataFrame)\

Access a table

You can view table details, read, update, and delete a table.

View table details

The table details view shows the table schema and sample data.

  1. Click Data Icon in the sidebar.

  2. In the Databases folder, click a database.

  3. In the Tables folder, click the table name.

  4. In the Cluster drop-down, optionally select another cluster to render the table preview.

    Table details


    To display the table preview, a Spark SQL query runs on the cluster selected in the Cluster drop-down. If the cluster already has a workload running on it, the table preview may take longer to load.

Query a table

These examples show you how to query and display a table called diamonds.

SELECT * FROM diamonds
diamonds = spark.sql("select * from diamonds")

diamonds = spark.table("diamonds")
diamonds <- sql(sqlContext, "select * from diamonds")

diamonds <- table(sqlContext, "diamonds")
val diamonds = spark.sql("select * from diamonds")

val diamonds = spark.table("diamonds")

Update a table

The table schema is immutable. However, you can update table data by changing the underlying files.

For example, for tables created from an S3 directory, adding or removing files in that directory changes the contents of the table.

After updating the files underlying a table, refresh the table using the following command:

REFRESH TABLE <table-name>

This ensures that when you access the table, Spark SQL reads the correct files even if the underlying files change.

Delete a table

Delete a table using the UI

  1. Click Data Icon in the sidebar.
  2. Click the Menu Dropdown next to the table name and select Delete.

Delete a table programmatically

DROP TABLE <table-name>

Managed and unmanaged tables

Every Spark SQL table has metadata information that stores the schema and the data itself.

A managed table is a Spark SQL table for which Spark manages both the data and the metadata. In the case of managed table, Databricks stores the metadata and data in DBFS in your account. Since Spark SQL manages the tables, doing a DROP TABLE example_data deletes both the metadata and data.

Some common ways of creating a managed table are:

CREATE TABLE <example-table>(id STRING, value STRING)

Another option is to let Spark SQL manage the metadata, while you control the data location. We refer to this as an unmanaged table. Spark SQL manages the relevant metadata, so when you perform DROP TABLE <example-table>, Spark removes only the metadata and not the data itself. The data is still present in the path you provided.

You can create an unmanaged table with your data in data sources such as Cassandra, JDBC table, and so on. See Data sources for more information about the data sources supported by Databricks. Some common ways of creating an unmanaged table are:

CREATE TABLE <example-table>(id STRING, value STRING) USING org.apache.spark.sql.parquet OPTIONS (PATH "<your-storage-path>")
dataframe.write.option('path', "<your-storage-path>").saveAsTable("<example-table>")

Replace table contents

There are two main approaches to replacing table contents: simple and recommended.

Simple way to replace table contents

The simplest way replace table contents is to delete the table metadata and data and create another table.

Managed table
DROP TABLE IF EXISTS <example-table>     // deletes the metadata and data
CREATE TABLE <example-table> AS SELECT ...
Unmanaged table
DROP TABLE IF EXISTS <example-table>    // deletes the metadata
dbutils.fs.rm("<your-s3-path>", true)   // deletes the data
CREATE TABLE <example-table> USING org.apache.spark.sql.parquet OPTIONS (PATH "<your-s3-path>") AS SELECT ...


Although the above approach is valid, since all data is on S3, you might run into S3 eventual consistency issues if you try to delete and immediately try to recreate it in the same location. You can read more about consistency issues in the blog S3mper: Consistency in the Cloud.

An alternative is to:

  1. Create the table using the SQL DDL:

    CREATE TABLE <table-name> (id long, date string) USING PARQUET LOCATION "<storage-location>"
  2. Store new data in <storage-location>.

  3. Run refresh table <table-name>.

Partitioned tables

Spark SQL is able to generate partitions dynamically at the file storage level to provide partition columns for tables.

Create a partitioned table

These examples partition data that you write. Spark SQL discovers the partitions and registers them in the Hive metastore.

// Create managed table as select

// Create unmanaged/external table as select
dataframe.write.mode(SaveMode.Overwrite).option("path", "<file-path>").saveAsTable("<example-table>")

However, if you create a partitioned table from existing data, Spark SQL does not automatically discover the partitions and register them in the Hive metastore. In this case, SELECT * FROM <example-table> does not return results. To register the partitions, run the following to generate the partitions: MSCK REPAIR TABLE "<example-table>".

// Save data to external files

// Create unmanaged/external table
spark.sql("CREATE TABLE <example-table>(id STRING, value STRING) USING parquet PARTITIONED BY(id) LOCATION "<file-path>"")
spark.sql("MSCK REPAIR TABLE "<example-table>"")

Partition pruning

When the table is scanned, Spark pushes down the filter predicates involving the partitionBy keys. In that case, Spark avoids reading data that doesn’t satisfy those predicates. For example, suppose you have a table <example-data> that is partitioned by <date>. A query such as SELECT max(id) FROM <example-data> WHERE date = '2010-10-10' reads only the data files containing tuples whose date value matches the one specified in the query.

Table access control

Table access control allow admins and users to give fine-grained access to other users. See Data object privileges for details.