DBeaver integration with Databricks

DBeaver is a local, multi-platform database tool for developers, database administrators, data analysts, data engineers, and others who need to work with databases. DBeaver supports Databricks as well as other popular databases.

This article describes how to use your local development machine to install, configure, and use the free, open source DBeaver Community Edition (CE) to work with databases in Databricks.

Note

This article was tested with macOS, Databricks JDBC Driver version 2.6.17, and DBeaver CE version 21.0.4.

Requirements

Before you install DBeaver, your local development machine must meet the following requirements:

  • A Linux 64-bit, macOS, or Windows 64-bit operating system. (Linux 32-bit is supported but not recommended.)
  • Download the Databricks JDBC Driver onto your local development machine, extracting the SparkJDBC42.jar file from the downloaded SimbaSparkJDBC42.zip file.

You must also have a Databricks cluster or SQL endpoint to connect DBeaver to.

Step 1: Install DBeaver

Download and install DBeaver CE as follows:

  • Linux: Download and run one of the Linux installers from the Download page on the DBeaver website. snap and flatpak installation options are provided on this page as well.
  • macOS: Use Homebrew to run brew install --cask dbeaver-community, or use MacPorts to run sudo port install dbeaver-community. A macOS installer is also available from the Download page on the DBeaver website.
  • Windows: Use Chocolatey to run choco install dbeaver. A Windows installer is also available from the Download page on the DBeaver website.

Step 2: Configure the Databricks JDBC Driver for DBeaver

Set up DBeaver with information about the Databricks JDBC Driver that you downloaded earlier.

  1. Start DBeaver.
  2. If you are prompted to create a new database, click No.
  3. If you are prompted to connect to or select a database, click Cancel.
  4. Click Database > Driver Manager.
  5. In the Driver Manager dialog box, click New.
  6. In the Create new driver dialog box, click the Libraries tab.
  7. Click Add File.
  8. Browse to the SparkJDBC42.jar file that you extracted earlier and click Open.
  9. Click Find Class.
  10. In the Driver class list, confirm that com.simba.spark.jdbc.Driver is selected.
  11. On the Settings tab, for Driver Name, enter Databricks.
  12. On the Settings tab, for Class Name, enter com.simba.spark.jdbc.Driver.
  13. Click OK.
  14. In the Driver Manager dialog box, click Close.

Step 3: Connect DBeaver to your Databricks databases

Use DBeaver to connect to the cluster or SQL endpoint to access the databases in your Databricks workspace.

  1. In DBeaver, click Database > New Database Connection.

  2. In the Connect to a database dialog box, on the All tab, click Databricks, and then click Next.

  3. Click the Main tab and enter a value in the JDBC URL field for your Databricks resource:

    1. Find the JDBC URL value on the JDBC/ODBC tab in the Advanced Options area for your cluster. The JDBC URL should look similar to the following example:

      jdbc:spark://dbc-a1b2345c-d6e7.cloud.databricks.com:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/1234567890123456/1234-567890-reef123;AuthMech=3;UID=token;PWD=<personal-access-token>

    2. Replace <personal-access-token> with your personal access token for the Databricks workspace.

    3. Check Save password locally.

    Tip

    If you do not want to store your personal access token on your local development machine, omit UID=token;PWD=<personal-access-token> from the JDBC URL and uncheck Save password locally. You will be prompted for your Username (the word token) and Password (your personal access token) each time you try to connect.

    1. Find the JDBC URL value on the Connection Details tab for your SQL endpoint. The JDBC URL should look similar to the following example:

      jdbc:spark://dbc-a1b2345c-d6e7.cloud.databricks.com:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/endpoints/a123456bcde7f890;

    2. For Username, enter the word token.

    3. For Password, enter your personal access token.

    4. Check Save password locally.

    Tip

    If you do not want to store your personal access token on your local development machine, leave Username and Password blank and uncheck Save password locally. You will be prompted for your Username (the word token) and Password (your personal access token) each time you try to connect.

  4. Click Test Connection.

    Tip

    You should start your Databricks resource before testing your connection. Otherwise the test might take several minutes to complete while the resource starts.

  5. If the connection succeeds, in the Connection Test dialog box, click OK.

  6. In the Connect to a database dialog box, click Finish.

In the Database Navigator window, a New Connection entry is displayed. To change the connection’s name to make it easier to identify:

  1. Right-click New Connection, and then click Edit connection.
  2. In the Connection settings dialog box, click General.
  3. For Connection name, replace New Connection with a different name for the connection.
  4. Click OK.

Repeat the instructions in this step for each resource that you want DBeaver to access.

Step 4: Use DBeaver to browse databases and their objects

Use DBeaver to access databases and database objects in your Databricks workspace such as tables and table properties, views, indexes, data types, and other object types.

  1. In DBeaver, in the Database Navigator window, right-click the connection that you want to use.

  2. If Connect is enabled, click it. (If Connect is disabled, you are already connected.)

    Tip

    You should start your resource before trying to connect to it. Otherwise the connection might take several minutes to complete while the resource starts.

  3. Expand the connection that you just connected to.

  4. Expand Spark.

  5. Expand the database that you want to browse.

  6. Expand Tables.

  7. Double-click a table.

  8. Within the table’s Properties tab, click the Columns tab.

Repeat the instructions in this step to access additional databases and database objects.

Step 5: Use DBeaver to run SQL statements

Use DBeaver to load the sample diamonds table from the Databricks datasets into the default database in your workspace and then query the table. For more information, see Create a table. If you do not want to load a sample table, skip ahead to Next steps.

  1. In DBeaver, in the Database Navigator window, right-click the connection that you want to use.

  2. If Connect is enabled, click it. (If Connect is disabled, you are already connected.)

    Tip

    You should start your resource before trying to connect to it. Otherwise the connection might take several minutes to complete while the resource starts.

  3. Click SQL Editor > New SQL Script.

  4. On the (connection-name) Script-1 tab, enter these SQL statements, which deletes a table named diamonds if it exists, and then creates a table named diamonds based on the contents of the CSV file in the Databricks File System (DBFS) mount point:

    DROP TABLE IF EXISTS diamonds;
    
    CREATE TABLE diamonds USING CSV OPTIONS (path "/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv", header "true");
    
  5. Click SQL Editor > Execute SQL Statement.

  6. In the Database Navigator window, expand Spark.

  7. Right-click the default database and click Refresh.

  8. Expand Tables, and then double-click diamonds.

  9. Within the diamonds tab, click the Data tab to see the table’s data.

To delete the diamonds table:

  1. Click SQL Editor > New SQL Script.

  2. On the (connection-name) Script-2 tab, enter this SQL statement, which deletes the diamonds table.

    DROP TABLE IF EXISTS diamonds;
    
  3. On the SQL Editor menu, click Execute SQL Statement.

  4. In the Database Navigator window, right-click the default database and then click Refresh. The diamonds table disappears from the list of tables.

Next steps