Power BI

Microsoft Power BI is a business analytics service that provides interactive visualizations with self-service business intelligence capabilities, enabling end users to create reports and dashboards by themselves without having to depend on information technology staff or database administrators.

When you use Databricks as a data source with Power BI, you can bring the advantages of Databricks performance and technology beyond data scientists and data engineers to all business users.

You can connect Power BI Desktop to your Databricks clusters using the built-in Spark connector. As a bonus, this connector lets you use DirectQuery to offload processing to Databricks, which is great when you have a massive amount of data that you don’t want to load into PowerBI or when you want to perform near real-time analysis.

This article describes how to use Power BI with Databricks.

Step 1: Download and install software

Download and install the following:

Step 2: Get Databricks connection information

  1. Get a personal access token.

  2. Get your cluster’s server hostname, port, and HTTP path using the instructions in Server hostname, port, HTTP path, and JDBC URL.

  3. Construct the server address to use in your Spark cluster connection in Power BI Desktop:

    1. Use the scheme https://.

    2. Append the server hostname after the scheme.

    3. Append the HTTP path after the server host name. For example:

      https://<server-hostname>/sql/protocolv1/o/0/0123-175621-kabob884
      

Step 3: Configure connection in Power BI Desktop to a Databricks cluster

  1. Launch Power BI Desktop, click Get Data in the toolbar, and click More….

    Launch Power BI Desktop
  2. In the Get Data dialog, search for and select the Spark connector.

    Select Spark connector
  3. Click Connect.

  4. On the Spark dialog, configure your cluster connection.

    Configure cluster connection
    • Server: Enter the server address that you constructed in Step 2.
    • Protocol: Select HTTP.
    • Data Connectivity mode: Select DirectQuery, which lets you offload processing to Spark. This is ideal when you have a large volume of data or when you want near real-time analysis.
  5. Click OK.

  6. Enter token in the User name field and the token from Step 2 in the Password field.

    Configure credentials
  7. Click Connect. The Power BI Navigator should display the data available for query in your Databricks cluster.

    Connect to Databricks data