Skip to main content

Ingest Salesforce formula fields incrementally

Beta

This feature is in Beta. Workspace admins can control access to this feature from the Previews page. See Manage Databricks previews.

By default, Lakeflow Connect doesn't ingest Salesforce formula fields incrementally. Instead, it takes a snapshot of formula fields on each pipeline run, then joins them with the rest of the table. However, you can enable incremental formula field ingestion, which often significantly improves performance and reduces costs.

How incremental formula field ingestion works

If you use the default snapshot approach, the end result of a Salesforce ingestion pipeline is a series of streaming tables (one per source table). If you use incremental formula field ingestion instead, the connector creates intermediary streaming tables to hold non-formula fields, but the end result is a materialized view that contains both formula fields and non-formula fields.

Salesforce cursor columns (like SystemModstamp or LastModifiedDate) don't track formula fields. When a formula output changes, Salesforce doesn't update the cursor columns. If you use the default snapshot approach, formula fields can silently change without being picked up by the ingestion pipeline. However, if you use incremental formula field ingestion, the connector recreates the formulas as materialized views, then recomputes the results incrementally.

When a non-formula field is deleted in Salesforce, the corresponding column in the destination is marked as inactive but is not deleted. If it reappears in the source, the pipeline fails with a duplicate column error. In contrast, when a formula field is deleted, it's removed from the final materialized view. If the field later reappears in the source, it's ingested into the materialized view with the latest data.

When you create a formula that returns a number in Salesforce, you can specify the scale (the number of digits allowed after the decimal). Salesforce automatically rounds the formula's result to the specified number of decimal places. To avoid cumulative rounding errors, the Databricks connector doesn't reproduce this behavior.

Limitations

The limitations in this section apply when you create an ingestion pipeline with incremental formula fields.

General limitations

  • You cannot enable this feature on an existing pipeline. You must create a new pipeline.
  • This feature is API-only. You cannot configure it using the Databricks UI.
  • The final materialized view does not support SCD type 2. However, you can view the history of non-formula fields in the intermediary streaming table.

Unsupported formula fields and functions

  • Fields that reference tables or columns that aren't currently ingested in the pipeline are not ingested incrementally.

  • The following formula functions are not supported:

    • GETSESSIONID
    • RUNASUSER
    • CURRENCYRATE
    • ISCLONE
    • VLOOKUP
    • GETRECORDIDS
    • PARENTGROUPVAL
    • PREVGROUPVAL
    • PRIORVALUE
    • ISCHANGED
    • ISNEW
    • PREDICT
    • MLPREDICT
    • IMAGEPROXYURL
    • JUNCTIONIDLIST
    • LINKTO
    • LINKGROUPVAL
    • REQUIRESCRIPT
    • URLFOR
    • GEOLOCATION
    • DISTANCE
    • HTMLENCODE
    • JSENCODE
    • JSINHTMLENCODE
    • URLENCODE
    • INCLUDE
  • Rollup summary fields and formulas that reference them are not supported.

  • Global entities (for example, $User, $Profile) are not supported.

  • Formulas that use ROUND(arg1, arg2) are not supported unless arg2 is a static integer.

  • Formulas with outputs that exceed the precision decimal(38,18), such as a formula that multiplies two decimals of precision (30,18). These cause the pipeline to fail. To exclude them, use the exclude_columns configuration. See Select columns to ingest.

Functions with ambiguous results

The following functions might yield different results in Databricks than in Salesforce:

Function

Behavior difference

LEN(TEXT(<decimal_number>)), such as LEN("123.45")

Because Databricks is more precise than Salesforce, this function returns a different value in Databricks than in Salesforce.

CHR(x)

In Salesforce, this function returns the character with the ASCII code specified by x. In Databricks, it returns the character whose ASCII code equals x modulo 256.

INITCAP("_mrSmith is _ok")

In Salesforce, this function capitalizes the first letter of every word, ignoring non-alphabetic characters, yielding "_MrSmith Is Ok". In Databricks, this function capitalizes the first character of every word only if it is a letter, yielding "_mrSmith Is _ok".

FORMAT_DURATION

In Salesforce, when two date-time inputs are exactly 30 minutes apart, the formula outputs 0:00:29:59 instead of the expected 0:00:30:00. This limitation doesn't exist in Databricks, so there might be a one-second gap between the source and destination values.

Create a pipeline with incremental formula fields

To enable incremental formula field ingestion, you must create a new pipeline with the configuration flag pipelines.enableSalesforceFormulaFieldsMVComputation: true. You cannot enable this feature on existing pipelines. For example code, see Example: Ingest formula fields incrementally.

Understand the error tracking table

When you enable incremental formula field ingestion, the connector creates an additional table to help with troubleshooting. This table always appears if the incremental formula fields flag is set, even if there are no errors. The error tracking table (named <pipeline-id>_formula_fields_error_reasons) has the following columns:

  • <source-object>: The name of the source object in Salesforce.
  • <formula-field>: The name of the formula field.
  • error: Error message that explains why the formula wasn't ingested incrementally.

If a formula field shows NULL values in your destination table, query the error tracking table to understand why the field wasn't ingested incrementally. For details, see Formula field values show as NULL (incremental ingestion).