Salesforce ingestion connector FAQs
This page answers frequently asked questions about the Salesforce ingestion connector in Databricks Lakeflow Connect.
General managed connector FAQs
The answers in Managed connector FAQs apply to all managed connectors in Lakeflow Connect. Keep reading for Salesforce-specific FAQs.
Which Salesforce products does the Salesforce ingestion connector support?
Lakeflow Connect supports ingesting data from the Salesforce products in the following table. Databricks also offers a zero-copy connector in Lakehouse Federation to run federated queries on Salesforce Data Cloud.
Salesforce product | Lakeflow Connect support | Alternative options |
---|---|---|
Automotive Cloud | ||
B2B Commerce | ||
B2C Commerce Cloud | Data Cloud | |
Data Cloud | ||
Digital Engagement | ||
Education Cloud | ||
Energy and Utilities Cloud | ||
Experience Cloud | ||
Feedback Management | ||
Field Service | ||
Health Cloud | ||
Life Sciences Cloud | ||
Lightning Platform | ||
Loyalty Cloud | ||
Media Cloud | ||
Manufacturing Cloud | ||
Marketing Cloud | Data Cloud | |
Net Zero Cloud | ||
Non-Profit Cloud | ||
Order Management | ||
Platform (standard and custom objects) | ||
Public Sector Solutions | ||
Rebate Management | ||
Retail & Consumer Goods Cloud | ||
Revenue Cloud | ||
Sales Cloud | ||
Salesforce Maps | ||
Salesforce Scheduler | ||
Service Cloud |
Which Salesforce connector should I use?
Databricks offers multiple connectors for Salesforce. There are two zero-copy connectors: the Salesforce Data Cloud file sharing connector and the Salesforce Data Cloud query federation connector. These allow you to query data in Salesforce Data Cloud without moving it. There is also a Salesforce ingestion connector that copies data from various Salesforce products, including Salesforce Data Cloud and Salesforce Sales Cloud.
The following table summarizes the differences between the Salesforce connectors in Databricks:
Connector | Use case | Supported Salesforce products |
---|---|---|
Salesforce Data Cloud file sharing | When you use the Salesforce Data Cloud file sharing connector in Lakehouse Federation, Databricks calls Salesforce Delivery-as-a-Service (DaaS) APIs to read data in the underlying cloud object storage location directly. Queries are run on Databricks compute without using the JDBC protocol. Compared to query federation, file sharing is ideal for federating a large amount of data. It offers improved performance for reading files from multiple data sources and better pushdown capabilities. See Lakehouse Federation for Salesforce Data Cloud File Sharing. | Salesforce Data Cloud |
Salesforce Data Cloud query federation | When you use the Salesforce Data Cloud query federation connector in Lakehouse Federation, Databricks uses JDBC to connect to source data and pushes queries down into Salesforce. See Run federated queries on Salesforce Data Cloud. | Salesforce Data Cloud |
Salesforce ingestion | The Salesforce ingestion connector in Lakeflow Connect allows you to create fully-managed ingestion pipelines from Salesforce Platform data, including data in Salesforce Data Cloud and Salesforce Sales Cloud. This connector maximizes value by leveraging not only CDP data but also CRM data in the Data Intelligence Platform. See Ingest data from Salesforce. | Salesforce Data Cloud, Salesforce Sales Cloud, and more. For a comprehensive list of supported Salesforce products, see the FAQ Which Salesforce products does the Salesforce ingestion connector support? on this page. |
Which Salesforce APIs does the ingestion connector use?
The connector uses both Salesforce Bulk API 2.0 and Salesforce REST API v63. For each pipeline update, the connector chooses the API based on how much data it must ingest. The goal is to limit load on the Salesforce APIs. For a larger amount of data (for example, the initial load of a typical object or the incremental load of a very active object), the connector typically uses Bulk API. For a smaller amount of data (for example, the incremental load of a typical object or the initial load of a very small object), the connector typically uses REST API.
How does Databricks connect to Salesforce?
Databricks connects to the Salesforce APIs using HTTPS. Credentials are stored securely in Unity Catalog and can only be retrieved if the user running the ingestion flow has the appropriate permissions. You can optionally create a separate user inside of Salesforce for ingesting data. If there are particular objects or columns that you want to restrict access to, you can use the built-in Salesforce permissions to ensure that the ingestion user doesn't have access to those entities.
How many Salesforce objects can be ingested in one pipeline?
Databricks recommends limiting one Salesforce pipeline to 250 tables. If you need to ingest more objects, create multiple pipelines.
Is there a limit on the number of attributes per object?
No.
How does the connector incrementally pull updates?
The connector selects the cursor column from the following list, in order of preference: SystemModstamp
, LastModifiedDate
, CreatedDate
, and LoginTime
. For example, if SystemModstamp
is unavailable, then it looks for LastModifiedDate
. Objects that don't have any of these columns can't be ingested incrementally. Formula fields can't be ingested incrementally.
Why does the number of updates match the number of rows—even on incremental pipeline runs?
The connector fully downloads formula fields during each pipeline update. In parallel, it incrementally reads non-formula fields. Finally, it combines them into one table.
How does the connector handle retries?
The connector automatically retries on failure, with exponential backoff. It waits 1 second before retrying again, then 2 seconds, then 4 seconds, and so on. Eventually, it stops retrying until the next run of the pipeline. You can monitor this activity in the pipeline usage logs, and you can set up notifications for fatal failures.
How does the connector handle Delta-incompatible data types?
Lakeflow Connect automatically transforms Salesforce data types to Delta-compatible data types. See Salesforce ingestion connector reference.
Does the connector support real-time ingestion?
No. If you're interested in this functionality, reach out to your account team.
How does the connector handle soft deletes?
Soft deletes are handled the same way as inserts and updates.
If your table has history tracking turned off: When a row is soft-deleted from Salesforce, it is deleted from the bronze table at the next sync of the data. For example, suppose you have a pipeline running hourly. If you sync at 12:00 PM, then have a deleted record at 12:30 PM, the deletion won't be reflected until the 1:00 PM sync occurs.
If your table has history tracking turned on: The connector marks the original row as inactive by populating the __END_AT column.
There is one edge case: If the records were deleted and then purged from Salesforce's recycling bin before the pipeline's next update. In this case, Databricks misses the deletes; you must full refresh the destination table to reflect them.
Note that some Salesforce objects, like the history object, do not support soft deletes.
How does the connector handle hard deletes?
Hard deletes are not supported automatically; you must full refresh the destination table to reflect them.