Salesforce ingestion connector FAQs

This page answers frequently asked questions about the Salesforce ingestion connector in Databricks Lakeflow Connect.

General managed connector FAQs

The answers in Managed connector FAQs apply to all managed connectors in Lakeflow Connect. Keep reading for Salesforce-specific FAQs.

Which Salesforce products does the Salesforce ingestion connector support?

Lakeflow Connect supports ingesting data from the Salesforce products in the following table. As alternatives to ingestion, Databricks offers zero-copy connectors in Lakehouse Federation to query data in Salesforce Data Cloud: file sharing and query federation.

Salesforce product	Lakeflow Connect support	Alternative options
Automotive Cloud
B2B Commerce
B2C Commerce Cloud		Data Cloud
Data Cloud	Partial support: `_dlm` objects (silver and gold tables)	Zero-copy connectors in Lakehouse Federation: query federation, file sharing
Digital Engagement
Education Cloud
Energy and Utilities Cloud
Experience Cloud
Feedback Management
Field Service
Health Cloud
Life Sciences Cloud
Lightning Platform
Loyalty Cloud
Media Cloud
Manufacturing Cloud
Marketing Cloud		Data Cloud
Net Zero Cloud
Non-Profit Cloud
Order Management
Platform (standard and custom objects)
Public Sector Solutions
Rebate Management
Retail & Consumer Goods Cloud
Revenue Cloud
Sales Cloud
Salesforce Maps
Salesforce Scheduler
Service Cloud

Which Salesforce connector should I use?

Databricks offers multiple connectors for Salesforce. There are two zero-copy connectors: the Salesforce Data Cloud file sharing connector and the Salesforce Data Cloud query federation connector. These allow you to query data in Salesforce Data Cloud without moving it. There is also a Salesforce ingestion connector that copies data from various Salesforce products.

The following table summarizes the differences between the Salesforce connectors in Databricks:

Connector	Use case	Supported Salesforce products
Salesforce Data Cloud file sharing	When you use the Salesforce Data Cloud file sharing connector in Lakehouse Federation, Databricks calls Salesforce Data-as-a-Service (DaaS) APIs to read data in the underlying cloud object storage location directly. Queries are run on Databricks compute without using the JDBC protocol. Compared to query federation, file sharing is ideal for federating a large amount of data. It offers improved performance for reading files from multiple data sources and better pushdown capabilities. See Lakehouse Federation for Salesforce Data Cloud File Sharing.	Salesforce Data Cloud
Salesforce Data Cloud query federation	When you use the Salesforce Data Cloud query federation connector in Lakehouse Federation, Databricks uses JDBC to connect to source data and pushes queries down into Salesforce. See Run federated queries on Salesforce Data Cloud.	Salesforce Data Cloud
Salesforce ingestion	The Salesforce ingestion connector in Lakeflow Connect allows you to create fully-managed ingestion pipelines from Salesforce Platform data. This connector maximizes value by leveraging not only CDP data but also CRM data in the Data Intelligence Platform. See Ingest data from Salesforce.	See the previous section Which Salesforce products does the Salesforce ingestion connector support?

Connector

Use case

Supported Salesforce products

Salesforce Data Cloud file sharing

When you use the Salesforce Data Cloud file sharing connector in Lakehouse Federation, Databricks calls Salesforce Data-as-a-Service (DaaS) APIs to read data in the underlying cloud object storage location directly. Queries are run on Databricks compute without using the JDBC protocol.

Compared to query federation, file sharing is ideal for federating a large amount of data. It offers improved performance for reading files from multiple data sources and better pushdown capabilities. See Lakehouse Federation for Salesforce Data Cloud File Sharing.

Salesforce Data Cloud

Salesforce Data Cloud query federation

When you use the Salesforce Data Cloud query federation connector in Lakehouse Federation, Databricks uses JDBC to connect to source data and pushes queries down into Salesforce. See Run federated queries on Salesforce Data Cloud.

Salesforce Data Cloud

Salesforce ingestion

The Salesforce ingestion connector in Lakeflow Connect allows you to create fully-managed ingestion pipelines from Salesforce Platform data. This connector maximizes value by leveraging not only CDP data but also CRM data in the Data Intelligence Platform. See Ingest data from Salesforce.

See the previous section Which Salesforce products does the Salesforce ingestion connector support?

Which Salesforce APIs does the ingestion connector use?

The connector uses both Salesforce Bulk API 2.0 and Salesforce REST API v63. For each pipeline update, the connector chooses the API based on how much data it must ingest. The goal is to limit load on the Salesforce APIs. For a larger amount of data (for example, the initial load of a typical object or the incremental load of a very active object), the connector typically uses Bulk API. For a smaller amount of data (for example, the incremental load of a typical object or the initial load of a very small object), the connector typically uses REST API.

How does Databricks connect to Salesforce?

Databricks connects to the Salesforce APIs using HTTPS. Credentials are stored securely in Unity Catalog and can only be retrieved if the user running the ingestion flow has the appropriate permissions. You can optionally create a separate user inside of Salesforce for ingesting data. If there are particular objects or columns that you want to restrict access to, you can use the built-in Salesforce permissions to ensure that the ingestion user doesn't have access to those entities.

How many Salesforce objects can be ingested in one pipeline?

Databricks recommends limiting one Salesforce pipeline to 250 tables. If you need to ingest more objects, create multiple pipelines.

Is there a limit on the number of attributes per object?

No.

How does the connector incrementally pull updates?

Incremental ingestion relies on a cursor column to detect record changes. The connector selects the cursor column from the following list, in order of preference: SystemModstamp, LastModifiedDate, CreatedDate, and LoginTime. For example, if SystemModstamp is unavailable, the connector uses LastModifiedDate instead. Objects that don't have at least one of these columns can't be ingested incrementally.

Formula fields are also not ingested incrementally. This is because Salesforce doesn't update the cursor column when the formula output changes, so formula fields can silently change without being picked up by the ingestion pipeline.

Why does the number of updates match the number of rows, even on incremental pipeline runs?

The connector fully downloads formula fields during each pipeline update. This is because Salesforce doesn't update the cursor column when the formula output changes, so formula fields can silently change without being picked up by the ingestion pipeline. In parallel, the connector incrementally reads non-formula fields. Finally, it combines them into one table.

How does the connector handle retries?

The connector automatically retries on failure, with exponential backoff. It waits 1 second before retrying again, then 2 seconds, then 4 seconds, and so on. Eventually, it stops retrying until the next run of the pipeline. You can monitor this activity in the pipeline usage logs, and you can set up notifications for fatal failures.

How does the connector handle Delta-incompatible data types?

Lakeflow Connect automatically transforms Salesforce data types to Delta-compatible data types. See Salesforce ingestion connector reference.

Does the connector support real-time ingestion?

No. If you're interested in this functionality, reach out to your account team.

How does the connector handle soft deletes?

Soft deletes are handled the same way as inserts and updates.

If your table has history tracking turned off: When a row is soft-deleted from Salesforce, it is deleted from the bronze table at the next sync of the data. For example, suppose you have a pipeline running hourly. If you sync at 12:00 PM, then have a deleted record at 12:30 PM, the deletion won't be reflected until the 1:00 PM sync occurs.

If your table has history tracking turned on: The connector marks the original row as inactive by populating the __END_AT column.

There is one edge case: If the records were deleted and then purged from Salesforce's recycling bin before the pipeline's next update. In this case, Databricks misses the deletes; you must full refresh the destination table to reflect them.

Note that some Salesforce objects, like the history object, do not support soft deletes.

How does the connector handle hard deletes?

Hard deletes are not supported automatically; you must full refresh the destination table to reflect them.

General managed connector FAQs​

Which Salesforce products does the Salesforce ingestion connector support?​

Which Salesforce connector should I use?​

Which Salesforce APIs does the ingestion connector use?​

How does Databricks connect to Salesforce?​

How many Salesforce objects can be ingested in one pipeline?​

Is there a limit on the number of attributes per object?​

How does the connector incrementally pull updates?​

Why does the number of updates match the number of rows, even on incremental pipeline runs?​

How does the connector handle retries?​

How does the connector handle Delta-incompatible data types?​

Does the connector support real-time ingestion?​

How does the connector handle soft deletes?​

How does the connector handle hard deletes?​