This feature is in Public Preview.
Syncsort helps you break down data silos by integrating legacy, mainframe, and IBM data with Databricks. You can easily pull data from these sources into Delta Lake.
Here are the steps for using Syncsort with Databricks.
Syncsort authenticates with Databricks using a Databricks personal access token.
As a security best practice when you authenticate with automated tools, systems, scripts, and apps, Databricks recommends that you use OAuth tokens or personal access tokens belonging to service principals instead of workspace users. To create tokens for service principals, see Manage tokens for a service principal.
Syncsort will write data to an S3 bucket and the Databricks integration cluster will read data from that location. Therefore the integration cluster requires secure access to the S3 bucket.
To access AWS resources, you can launch the Databricks integration cluster with an instance profile. The instance profile should have access to the staging S3 bucket and the target S3 bucket where you want to write the Delta tables. To create an instance profile and configure the integration cluster to use the role, follow the instructions in Configure S3 access with instance profiles.
As an alternative, you can use IAM credential passthrough, which enables user-specific access to S3 data from a shared cluster.
Set Cluster Mode to Standard.
Set Databricks Runtime Version to a Databricks runtime version.
spark.databricks.delta.optimizeWrite.enabled true spark.databricks.delta.autoCompact.enabled true
Configure your cluster depending on your integration and scaling needs.
For cluster configuration details, see Create a cluster.
See Retrieve the connection details for the steps to obtain the JDBC URL and HTTP path.
To connect a Databricks cluster to Syncsort you need the following JDBC/ODBC connection properties:
Go to the Databricks and Connect for Big Data login page and follow the instructions.