This article describes how to use Looker with the Databricks Data Science & Engineering workspace. Looker is not supported in Databricks SQL.

Step 1: Get Databricks connection information

  1. Get a personal access token.
  2. Get the server hostname, port, and HTTP path.

Step 2: Configure Databricks cluster connection in Looker

  1. In Looker, go to Admin > Connections > New Database Connection.

    Cluster connection parameters
  2. Enter a name and Apache Spark dialect.

  3. In the Host and Port fields, enter the information you retrieved in Step 1.

  4. Enter token in the Username field and the token from Step 2 in the Password field.

  5. If you want to translate queries into other time zones, adjust Query Time Zone.

  6. Set Additional Params to looker_db;AuthMech=3;transportMode=http;ssl=1;httpPath=, appending the HTTP path from Step 1.

  7. For the remaining fields, keep the defaults:

    • Do not enable Persistent Derived Tables.
    • Keep the Max Connections and Connection Pool Timeout defaults.
    • Leave Database Time Zone blank (assuming that you are storing everything in UTC).

For more information, see the Looker documentation.

Step 3: Begin modeling your database in Looker by creating a project and running the generator

This step assumes that there are permanent tables stored in the default database of your cluster.

  1. If necessary, apply Developer Mode by toggling the Dev button from OFF to ON.
  2. Go to LookML > Manage Projects.
  3. Click New LookML Project.
  4. Configure the new project.
    • Give the project a name.
    • Select Generate Model & Views.
    • Select the Connection name that you provided when you created the database connection.
    • Select All Tables.
    • Set Schemas to default, unless you have other databases to model in the cluster.
  5. Click Create Project.

After you create the project and the generator runs, Looker displays a user interface with one model file and multiple view files. The model file shows the tables in the schema and any discovered join relations between them, and the view files list each dimension (column) available for each table in the schema.

Enable and manage persistent derived tables (PDTs)

Looker can reduce query times and database loads by creating persistent derived tables (PDTs). A PDT is a derived table that Looker writes into a scratch schema in your database. Looker then regenerates the PDT on the schedule that you specify. For more information, see Persistent derived tables (PDTs) in the Looker documentation.

To enable PDTs for a database connection, select Persistent Derived Tables for that connection and complete the on-screen instructions. For more information, see Persistent Derived Tables and Configuring Separate Login Credentials for PDT Processes in the Looker documentation.

When PDTs are enabled, by default Looker regenerates PDTs every 5 minutes by connecting to the associated database. Looker restarts the associated Databricks resource if it is stopped. To change this default frequency, set the PDT And Datagroup Maintenance Schedule field for your database connection to a valid cron expression. For more information, see PDT and Datagroup Maintenance Schedule in the Looker documentation.

To enable PDTs or to change the PDT regeneration frequency for an existing database connection, click Admin > Database Connections, click Edit next to your database connection, and follow the preceding instructions.