Managed connectors in Lakeflow Connect
Managed SaaS and database connectors provided by Lakeflow Connect are in various release states.
This article provides an overview of managed connectors in Databricks Lakeflow Connect for ingesting data from SaaS applications and databases. The resulting ingestion pipeline is governed by Unity Catalog and is powered by serverless compute and DLT. Managed connectors leverage efficient incremental reads and writes to make data ingestion faster, scalable, and more cost-efficient, while your data remains fresh for downstream consumption.
SaaS connector components
A SaaS connector is modeled by the following components:
- Connection: A Unity Catalog securable object that stores authentication details for the database.
- Ingestion pipeline: Ingests the staged data into Delta tables. This component is modeled as a serverless DLT pipeline.
Database connector components
A database connector is modeled by the following components:
- Connection: A Unity Catalog securable object that stores authentication details for the database.
- Gateway: Extracts data from the source database and maintains the integrity of transactions during the transfer. For cloud-based databases, the gateway is configured as a DLT pipeline with classic compute.
- Staging storage: A Unity Catalog volume where data from the gateway is staged before being applied to a Delta table. The staging storage account is created when you deploy the gateway and exists within the catalog and schema that you specify.
- Ingestion pipeline: Ingests the staged data into Delta tables. This component is modeled as a serverless DLT pipeline.
Lakeflow Connect vs. Lakehouse Federation vs. Delta Sharing
Lakehouse Federation allows you to query external data sources without moving your data. Delta Sharing allows you to securely share live data across platforms, clouds, and regions. Databricks recommends ingestion using Lakeflow Connect because it scales to accommodate high data volumes, low-latency querying, and third-party API limits. However, you might want to query your data without moving it.
When you have a choice between Lakeflow Connect, Lakehouse Federation, and Delta Sharing, choose Delta Sharing for the following scenarios:
- Limiting data duplication.
- Querying the freshest possible data.
Choose Lakehouse Federation for the following scenarios:
- Ad hoc reporting or proof-of-concept work on your ETL pipelines.
Managed connectors vs. Auto Loader
Managed connectors allow you to incrementally ingest data from enterprise applications and databases. Auto Loader is a connector for cloud object storage that allows you to incrementally ingest files as they arrive in S3, ADLS, and GCS. It is compatible with Structured Streaming and DLT but is not fully-managed.
Can managed connectors write back to third-party apps and databases?
No. If you’re interested in this functionality, reach out to your account team.
What is the cost for managed connectors?
Managed connectors use a compute-based pricing model.
SaaS sources like Salesforce and Workday, which run exclusively on serverless infrastructure, incur serverless DLT DBU charges.
For database sources like SQL Server, ingestion gateways can run in classic mode or serverless mode depending on the source, and ingestion pipelines run on serverless. As a result, you can receive both classic and serverless DLT DBU charges.
For rate details, see the DLT pricing page.
Dependence on external services
Databricks SaaS, database, and other managed connectors depend on the accessibility, compatibility, and stability of the application, database, or external service they connect to. Databricks does not control these external services and, therefore, has limited (if any) influence over their changes, updates, and maintenance. If changes, disruptions, or circumstances related to an external service impede or render impractical the operation of a connector, Databricks may discontinue or cease maintaining that connector. Databricks will make reasonable efforts to notify customers of discontinuation or cessation of maintenance, including updates to the applicable documentation.