Alteryx

This article describes how to use Alteryx with Databricks.

Requirements

Alteryx 10.6 and above. In-database processing requires 64-bit Alteryx with 64-bit database drivers.

Step 1: Get Databricks connection information

  1. Get a personal access token.
  2. Get the server hostname, port, and HTTP path.

Step 2: Configure the Simba Spark ODBC driver

  1. Open the ODBC Admin console that corresponds to the driver type.
  2. In the User tab, click Add.
  3. Select Simba Spark ODBC Driver.
  4. Enter the following:
    • Data Source Name: Databricks
    • Description: (optional)
    • Spark Server Type: SparkThriftServer
    • Host(s): host from Step 1.
    • Port: port from Step 1.
    • Authentication: HTTP
    • Mechanism: Token
    • User Name: token
    • Password: personal access token from Step 1.
  5. Select Save Password (Encrypted).
  6. Select Advanced Options.
  7. Select Fast SQLPrepare.
  8. Select Get Tables With Query.
  9. Select Show System Table.
  10. Click Test to test connection setup.
  11. Click OK.

Step 3: Configure connection in Alteryx to a Databricks cluster

  1. In Alteryx Designer, go to the In-Database tool tab.
  2. Drag a Connect In-DB tool onto the canvas.
  3. In the Configuration Panel, click the drop-down menu under Connection Name.
  4. Select Manage Connections…
  5. Enter a Connection Type of User.
  6. Under Connections, click New and enter the following:
    • Connection Name: Databricks (or other preferred name)
    • Password Encryption: Encrypt for User (or other if preferred)
  7. Click the Read tab and enter the following:
    1. Driver: Spark ODBC
    2. Click the drop-down menu under Connection String.
    3. Select New Database Connection….
      1. Click the Spark Data Source Name drop-down and select Databricks (User).
      2. Click OK.
  8. Click the Write tab and enter the following:
    1. Driver: Databricks Bulk Loader (Avro) or (CSV)
    2. Click the drop-down menu under Connection String.
    3. Select New Databricks Connection….
    4. Under the ODBC Data Source select Databricks (User).
      • In the Username field, enter token.
      • In the Password field, enter your personal access token from Step 2.
      • In Databricks URL, enter https:// + the host from Step 2.
  9. Click OK.