Connect to Matillion
Matillion ETL is an ETL/ELT tool built specifically for cloud database platforms including Databricks. Matillion ETL has a modern, browser-based UI, with powerful, push-down ETL/ELT functionality.
You can integrate your Databricks SQL warehouses (formerly Databricks SQL endpoints) and Databricks clusters with Matillion.
Connect to Matillion using Partner Connect
This section describes how to use Partner Connect to simplify the process of connecting an existing SQL warehouse or cluster in your Databricks workspace to Matillion.
Requirements
See the requirements for using Partner Connect.
Steps to connect
To connect to Matillion using Partner Connect, follow the steps in this section.
Tip
If you have an existing Matillion account, Databricks recommends that you connect to Matillion manually. This is because the connection experience in Partner Connect is optimized for new partner accounts.
In the sidebar, click Partner Connect.
Click the Matillion tile.
The Email box displays the email address for your Databricks account. Matillion uses this email address to prompt you to either create a new Matillion account or sign in to your existing Matillion account.
Click Connect to Matillion ETL or Sign in.
A new tab opens in your browser that displays the Matillion Hub.
Complete the on-screen instructions in Matillion to create your 14-day trial Matillion account or to sign in to your existing Matillion account.
Important
If an error displays stating that someone from your organization has already created an account with Matillion, contact one of your organization’s administrators and have them add you to your organization’s Matillion account. After they add you, sign in to your existing Matillion account.
Complete the on-screen instructions to provide your job details, then click Continue.
Complete the on-screen instructions to create an organization, then click Continue.
Click the organization you created, then click Add Matillion ETL instance.
Click Continue in AWS.
The Amazon EC2 console opens.
Follow Launching Matillion ETL using Amazon Machine Image in the Matillion ETL documentation, starting with step 5. Then follow Accessing Matillion ETL on Amazon Web Services (EC2) in the Matillion ETL documentation.
Follow the instructions in the Matillion ETL documentation.
Matillion ETL opens in your browser, and the Create Project dialog box displays.
Follow Create a Delta Lake on Databricks project in the Matillion documentation.
For the settings in the Delta Lake Connection section within these instructions, enter the following information:
For Workspace ID, enter the ID of your Databricks workspace. See Workspace instance names, URLs, and IDs.
For Username, enter the word
token
.For Password, enter the value of a Databricks personal access token.
To get the Workspace ID and generate personal access token, do the following:
Return to the Partner Connect tab in your browser.
Take note of the Workspace ID.
Click Generate a new token.
A new tab opens in your browser that displays the Settings page of the Databricks UI.
Click Generate new token.
Optionally enter a description (comment) and expiration period.
Click Generate.
Copy the generated personal access token and store it in a secure location.
Return to the Matillion tab in your browser.
For the settings in the Delta Lake Defaults section within these instructions, for Cluster, choose the name of the SQL warehouse or cluster.
Continue with Next steps.
Connect to Matillion manually
This section describes how to connect an existing SQL warehouse or cluster in your Databricks workspace to Matillion manually.
Note
You can connect to Matillion using Partner Connect to simplify the experience.
Requirements
Before you integrate with Matillion manually, you must have the following:
A Matillion ETL instance, which you can launch by using AWS CloudFormation, an Amazon Machine Image (AMI), or the AWS Marketplace.
A Databricks personal access token.
Note
As a security best practice when you authenticate with automated tools, systems, scripts, and apps, Databricks recommends that you use OAuth tokens.
If you use personal access token authentication, Databricks recommends using personal access tokens belonging to service principals instead of workspace users. To create tokens for service principals, see Manage tokens for a service principal.
Steps to connect
To connect to Matillion manually, do the following:
Get the name of the existing compute resource that you want to use (a SQL warehouse or cluster) within your workspace. Later, you will choose that name to complete the connection between your compute resource and your Matillion ETL instance.
To view SQL warehouses in your workspace, click SQL Warehouses in the sidebar. To create a new SQL warehouse, see Create a SQL warehouse.
To view the clusters in your workspace, click Compute in the sidebar. To create a cluster, see Compute configuration reference.
Follow Connect to your Matillion ETL instance and log in to it in the Matillion documentation.
Follow Create a Delta Lake on Databricks project in the Matillion documentation.
For the settings in the Delta Lake Connection section within these instructions, enter the following information:
For Workspace ID, enter the ID of your Databricks workspace. See Workspace instance names, URLs, and IDs.
For Username, enter the word
token
.For Password, enter the Databricks personal access token.
For the settings in the Delta Lake Defaults section within these instructions, for Cluster, choose the name of the SQL warehouse or cluster.
Continue with Next steps.