Salesforce ingestion connector FAQs
This page answers frequently asked questions about the Salesforce ingestion connector in Databricks Lakeflow Connect.
General managed connector FAQs
The answers in Managed connector FAQs apply to all managed connectors in Lakeflow Connect. Keep reading for Salesforce-specific FAQs.
Connector-specific FAQs
The answers in this section are specific to the Salesforce ingestion connector.
Which Salesforce products does the connector support?
Lakeflow Connect supports ingesting data from the Salesforce products in the following table. Databricks also offers a zero-copy connector in Lakehouse Federation to run federated queries on Salesforce Data Cloud.
Salesforce product | Lakeflow Connect support | Alternative options |
---|---|---|
Automotive Cloud | ||
B2B Commerce | ||
B2C Commerce Cloud | Data Cloud | |
Data Cloud | ||
Digital Engagement | ||
Education Cloud | ||
Energy and Utilities Cloud | ||
Experience Cloud | ||
Feedback Management | ||
Field Service | ||
Health Cloud | ||
Life Sciences Cloud | ||
Lightning Platform | ||
Loyalty Cloud | ||
Media Cloud | ||
Manufacturing Cloud | ||
Marketing Cloud | Data Cloud | |
Net Zero Cloud | ||
Non-Profit Cloud | ||
Order Management | ||
Platform (standard and custom objects) | ||
Public Sector Solutions | ||
Rebate Management | ||
Retail & Consumer Goods Cloud | ||
Revenue Cloud | ||
Sales Cloud | ||
Salesforce Maps | ||
Salesforce Scheduler | ||
Service Cloud |
What is the relationship between this connector and the federated connector for Salesforce Data Cloud?
These products are distinct. Databricks has partnered with Salesforce to offer a zero-copy connector that queries data in Salesforce Data Cloud. In contrast, Lakeflow Connect offers an ingestion connector that copies data from the Salesforce Platform. When paired with Lakeflow Connect for Salesforce, this allows customers to further maximize value, leveraging not only their CDP data but also their CRM data within the Data Intelligence Platform.
Which Salesforce APIs does Lakeflow Connect use?
The connector uses both Salesforce Bulk API 2.0 and Salesforce REST API v63. For each pipeline update, the connector chooses the API based on how much data it must ingest. The goal is to limit load on the Salesforce APIs. For a larger amount of data (for example, the initial load of a typical object or the incremental load of a very active object), the connector typically uses Bulk API. For a smaller amount of data (for example, the incremental load of a typical object or the initial load of a very small object), the connector typically uses REST API.
How does Databricks connect to Salesforce?
Databricks connects to the Salesforce APIs using HTTPS. Credentials are stored securely in Unity Catalog and can only be retrieved if the user running the ingestion flow has the appropriate permissions. You can optionally create a separate user inside of Salesforce for ingesting data. If there are particular objects or columns that you want to restrict access to, you can use the built-in Salesforce permissions to ensure that the ingestion user doesn't have access to those entities.
How many Salesforce objects can be ingested in one pipeline?
Databricks recommends limiting one Salesforce pipeline to 250 tables. If you need to ingest more objects, create multiple pipelines.
Is there a limit on the number of attributes per object?
No.
How does the connector incrementally pull updates?
The connector selects the cursor column from the following list, in order of preference: SystemModstamp
, LastModifiedDate
, CreatedDate
, and LoginTime
. For example, if SystemModstamp
is unavailable, then it looks for LastModifiedDate
. Objects that don't have any of these columns can't be ingested incrementally. Formula fields can't be ingested incrementally.
Why does the number of updates match the number of rows—even on incremental pipeline runs?
The connector fully downloads formula fields during each pipeline update. In parallel, it incrementally reads non-formula fields. Finally, it combines them into one table.
How does the connector handle retries?
The connector automatically retries on failure, with exponential backoff. It waits 1 second before retrying again, then 2 seconds, then 4 seconds, and so on. Eventually, it stops retrying until the next run of the pipeline. You can monitor this activity in the pipeline usage logs, and you can set up notifications for fatal failures.
How does the connector handle Delta-incompatible data types?
Lakeflow Connect automatically transforms Salesforce data types to Delta-compatible data types. See Salesforce ingestion connector reference.
Does the connector support real-time ingestion?
No. If you're interested in this functionality, reach out to your account team.
How does the connector handle soft deletes?
Soft deletes are handled the same way as inserts and updates.
If your table has history tracking turned off: When a row is soft-deleted from Salesforce, it is deleted from the bronze table at the next sync of the data. For example, suppose you have a pipeline running hourly. If you sync at 12:00 PM, then have a deleted record at 12:30 PM, the deletion won't be reflected until the 1:00 PM sync occurs.
If your table has history tracking turned on: The connector marks the original row as inactive by populating the __END_AT column.
There is one edge case: If the records were deleted and then purged from Salesforce's recycling bin before the pipeline's next update. In this case, Databricks misses the deletes; you must full refresh the destination table to reflect them.
Note that some Salesforce objects, like the history object, do not support soft deletes.
How does the connector handle hard deletes?
Hard deletes are not supported automatically; you must full refresh the destination table to reflect them.