Tableau

Tableau is a popular visual analytics solution for Spark, providing fast, easy to use, and interactive analytics.

You do not need to write Spark SQL or code to start answering questions of the data.

Connect your Databricks clusters to Tableau to get instant access to your Spark SQL tables.

Requirements

  • Tableau Desktop/Server, versions 9.3 and above.
  • The latest Simba Spark ODBC driver (at least version 1.2.0)

Note

  • The Spark SQL driver is installed by default with Mac Tableau Desktop 10.2 - 2018.1.
  • Make sure the Tableau Desktop driver and Tableau Server driver have the same version.

Connect Tableau to Spark 2.x clusters

Spark 2.x clusters expose JDBC/ODBC connections via HTTPS.

  1. Get your cluster’s server hostname, port, and HTTP path using the instructions in Connecting BI Tools.
  2. Launch Tableau, go to the Connect menu, click To a server > More..., and select Spark SQL.
  3. Enter your cluster’s server hostname and port.
  4. In the Type drop-down, select SparkThriftServer.
  5. In the Authentication drop-down, select Username and Password.
  6. In the Transport drop-down, select HTTP.
  7. Use your Databricks username and password. You can also use “token” as the username and a personal access token as the password.
  8. Enter your cluster’s HTTP Path.
  9. Click Sign In.
Tableau

Connect Tableau to Spark 1.x clusters

Follow the instructions in Spark 1.x Cluster Connectivity to connect Tableau to Spark 1.x clusters over an ODBC connection, using the binary protocol.

  • Server is your Spark driver’s public IP or hostname.
  • Port is 10000
  • Authentication method is User Name and Password. Use your Databricks username and password.
Tableau Input Parameters

Frequently asked questions (FAQ)

Tableau SparkSQL Connection vs. Generic ODBC Connection
These “connections” are overlays to a driver to add optimizations and stabilization’s. In both cases, they are connecting to the Simba ODBC Driver. Use the SparkSQL Connector for now. And make sure that the Tableau Desktop driver has the same version as the Tableau Server Driver’s.
Customizing ODBC configurations

By default, the parameters from the connection URL override these in the Simba ODBC DSN. There are two ways you can customize the ODBC configurations from Tableau:

  • .tds file for a single data source:
    1. Follow the instructions here to export the .tds file for the data source.
    2. Find the property line odbc-connect-string-extras='' in the .tds file and set the parameters. For example, to enable AutoReconnect and UseNativeQuery, you can change the line to odbc-connect-string-extras='AutoReconnect=1,UseNativeQuery=1'.
    3. Reload the .tds file by reconnecting the connection.
  • .tdc file for all data sources:
    1. If you never created .tdc file before, you can add this file to the folder Document/My Tableau Repository/Datasources.
    2. Add it to all developers for their Tableau Desktops, so it can also work when the dashboards are shared.
[Simba][Hardy] (71) Failed to establish connection with unknown error
This error can happen within a unstable network environment. To fix this error, we need to make sure the Simba driver is using the Auto Reconnect feature. To force Tableau to use AutoReconnect feature of the Simba driver, you’ll want to append AutoReconnect=1 to the value of odbc-connect-string-extras property .
Query rewritten by ODBC driver
Sometimes the ODBC driver writes queries from Tableau and introduces performance overhead. To prevent the ODBC driver write the query, UseNativeQuery feature needs to be enabled. You’ll want to append UseNativeQuery=1 to the value of odbc-connect-string-extras property .
Fetching many rows is slow
For Databricks Runtime 3.5 and above, the cluster driver is optimized to use less heap memory for collecting large results, so it can serve more rows per fetch block than Simba ODBC’s default. You’ll want to append RowsFetchedPerBlock=100000' to the value of odbc-connect-string-extras property.