Microsoft Power BI is a business analytics service that provides interactive visualizations with self-service business intelligence capabilities, enabling end users to create reports and dashboards by themselves without having to depend on information technology staff or database administrators.
When you use Databricks as a data source with Power BI, you can bring the advantages of Databricks performance and technology beyond data scientists and data engineers to all business users.
You can connect Power BI Desktop to your Databricks clusters using the built-in Spark connector. As a bonus, this connector lets you use DirectQuery to offload processing to Databricks, which is great when you have a massive amount of data that you don’t want to load into PowerBI or when you want to perform near real-time analysis.
This article describes how to use Power BI with Databricks.
Download and install the following Power BI Desktop.
Get a personal access token.
Get your cluster’s server hostname, port, and HTTP path using the instructions in Server hostname, port, HTTP path, and JDBC URL.
Construct the server address to use in your Spark cluster connection in Power BI Desktop:
Use the scheme
Append the server hostname after the scheme.
Append the HTTP path after the server host name. For example:
Launch Power BI Desktop, click Get Data in the toolbar, and click More….
In the Get Data dialog, search for and select the Spark connector.
On the Spark dialog, configure your cluster connection.
- Server: Enter the server address that you constructed in Step 2.
- Protocol: Select HTTP.
- Data Connectivity mode: Select DirectQuery, which lets you offload processing to Spark. This is ideal when you have a large volume of data or when you want near real-time analysis.
tokenin the User name field and the token from Step 2 in the Password field.
Click Connect. The Power BI Navigator should display the data available for query in your Databricks cluster.