Power BI

Microsoft Power BI is a business analytics service that provides interactive visualizations with self-service business intelligence capabilities, enabling end users to create reports and dashboards by themselves without having to depend on information technology staff or database administrators.

When you use Databricks as a data source with Power BI, you can bring the advantages of Databricks performance and technology beyond data scientists and data engineers to all business users.

You can connect Power BI Desktop to your Databricks clusters using the built-in Spark connector. As a bonus, this connector lets you use DirectQuery to offload processing to Databricks, which is great when you have a massive amount of data that you don’t want to load into PowerBI or when you want to perform near real-time analysis.

Requirements

  • Download and install the Power BI Desktop client. See Get Power BI Desktop in the Microsoft documentation.
  • Get a personal access token for Databricks API access. See Authentication.

Connect Power BI Desktop to a Databricks cluster

You must first get the JDBC connection information for your cluster and then provide that information as a server address when you configure the connection in Power BI Desktop.

Step 1: Get the JDBC server address

  1. In Databricks, go to Clusters and select the cluster you want to connect to.

  2. On the cluster edit page, scroll down and select the JDBC/ODBC tab.

    no-alternative-text
  3. On the JDBC/ODBC tab, copy and save the JDBC URL.

  4. Construct the JDBC server address that you will use when you set up your Spark cluster connection in Power BI Desktop. Take the JDBC URL that you copied and saved in step 3 and do the following:

    • Replace jdbc:spark with https.
    • Remove everything in the path between the port number and sql, retaining the components indicated by the boxes in the image below.
    no-alternative-text

    In our example, the server address would be:

    https://<databricks-instance>.cloud.databricks.com/sql/protocolv1/o/0/0524-220842-xxxxxxx
    

    or, if you choose the aliased version:

    https://<databricks-instance>.cloud.databricks.com:443/sql/protocolv1/o/0/54-ml
    

Step 2: Configure and make the connection in Power BI Desktop

  1. Launch Power BI Desktop, click Get Data in the toolbar, and click More….

    no-alternative-text
  2. In the Get Data dialog, search for and select the Spark connector.

    no-alternative-text
  3. Click Connect.

  4. On the Spark dialog, configure your cluster connection.

    no-alternative-text
    • Server: Enter the server address that you constructed from the JDBC URL in Step 1.
    • Protocol: Select HTTP.
    • Data Connectivity mode: Select DirectQuery, which lets you offload processing to Spark. This is ideal when you have a large volume of data or when you want near real-time analysis.
  5. Click OK.

  6. On the next dialog, enter the word token in the User name field and a personal access token in the Password field.

    no-alternative-text
  7. Click Connect.

    The Power BI Navigator should display the data available for query in your Databricks cluster.

    no-alternative-text