Anomalo is a data quality validation platform that ensures accurate, complete, and consistent data that is in line with your expectations. By connecting to Databricks, Anomalo brings a unifying layer that ensures you can trust the quality of your data before it is consumed by various business intelligence and analytics tools or modeling and machine learning frameworks.
You can integrate your Databricks clusters and Databricks SQL warehouses (formerly Databricks SQL endpoints) with Anomalo.
To connect your Databricks workspace to Anomalo using Partner Connect, see Connect to a data quality partner solution using Partner Connect.
Partner Connect only supports Databricks SQL warehouses for Anomalo. To connect a cluster in your Databricks workspace to Anomalo, connect to Anomalo manually.
This section describes how to connect an existing SQL warehouse or cluster to Anomalo manually.
Before you connect to Anomalo manually, you must have the following:
A cluster or SQL warehouse in your Databricks workspace.
The connection details for your cluster or SQL warehouse, specifically the Server Hostname, Port, and HTTP Path values.
A Databricks personal access token. See Generate a personal access token.
As a security best practice, when authenticating with automated tools, systems, scripts, and apps, Databricks recommends you use access tokens belonging to service principals instead of workspace users. For more information, see Service principals for Databricks automation.
To connect to Anomalo manually, do the following:
If you just signed up for Anomalo, on the Let’s start by adding a data source page, click Databricks. If you signed in to your existing Anomalo account, click + Connect a data source, and then click Databricks.
On the Connect to a data source page, enter a name for this data source.
For Server Hostname, enter the Server Hostname value from step 1.
For HTTP Path, enter the HTTP Path value from step 1.
For Personal Access Token, enter the token value from step 1.
Click Continue to test the connection to Anamalo.
After the connection succeeds, on the Choose a schema page, select the target schema that you want Anomalo to use within your workspace.
On the Choose a table page, select the target table that you want Anomalo to start with within your workspace.
On the Configure page, provide settings for Anomalo to use for scheduling, data freshness, and alerts and notifications. For help, click the question mark icon next to each group of settings.
Click Save & view table.
Continue with next steps.