Veeva Vault connector limitations

Beta

This feature is in Beta. Workspace admins can control access to this feature from the Previews page. See Manage Databricks previews.

The Veeva Vault connector has the following limitations.

General limitations

When you run a scheduled pipeline, alerts don't trigger immediately. Instead, they trigger when the next update runs.
When a source table is deleted, the destination table is not automatically deleted. You must delete the destination table manually. This behavior is not consistent with Spark Declarative Pipelines on Lakeflow behavior.
During source maintenance periods, Databricks might not be able to access your data.
If a source table name conflicts with an existing destination table name, the pipeline update fails.
Multi-destination pipeline support is API-only.
You can optionally rename a table that you ingest. If you rename a table in your pipeline, it becomes an API-only pipeline, and you can no longer edit the pipeline in the UI.
If you select a column after a pipeline has already started, the connector does not automatically backfill data for the new column. To ingest historical data, manually run a full refresh on the table.
Databricks can't ingest two or more tables with the same name in the same pipeline, even if they come from different source schemas.
The source system assumes that the cursor columns are monotonically increasing.
The connector ingests raw data without transformations. Use downstream Spark Declarative Pipelines on Lakeflow pipelines for transformations.

Authentication

Only OAuth 2.0 Machine-to-Machine (M2M) authentication via an external OIDC identity provider (Microsoft Entra ID) is supported. Username and password authentication is not supported.

Pipeline scheduling

Veeva generates incremental archives every 15 minutes. Pipeline runs scheduled more frequently than every 15 minutes do not see new data.

Archive retention

Veeva retains incremental archives for 10 days and full archives for 2 days. If a pipeline falls more than 10 days behind, the incremental archive chain is broken and a full refresh is required.

Full refresh behavior

When a full refresh is triggered, the process spans two pipeline updates: the first update clears the staged archive state from the Unity Catalog volume, and the actual full data reload occurs in the subsequent update.

ID field data types

id fields are always stored as the STRING type in Databricks, regardless of the type declared in Veeva. This is required for the primary-key functionality of Lakeflow pipelines to work correctly.

Schema changes

Databricks recommends performing a full refresh after schema changes in Veeva to ensure they are visible in your destination tables.

The connector handles schema changes as follows:

Field deletion: The column remains in the destination table, but all values are set to null and are no longer queryable.
Field rename: Existing records are discoverable under the old field name. New records created after the rename appear under the new field name.
Object deletion: Deleted objects remain discoverable in the schema.
Object rename: The old object name remains in the schema. New records added under the new object name appear under the new table name.

System table support

The initial release supports ingestion from a fixed set of __sys tables:

DOCUMENT_VERSION
DOCUMENT_RELATIONSHIP
PICKLIST
WORKFLOW
WORKFLOW_ITEM
WORKFLOW_TASK
WORKFLOW_TASK_ITEM
ACTIVE_LEGACY_WORKFLOW
ACTIVE_LEGACY_WORKFLOW_TASK
INACTIVE_LEGACY_WORKFLOW
INACTIVE_LEGACY_WORKFLOW_TASK

Other __sys tables are not ingested in this release. A subsequent release will expand support to all system tables available in your Vault. After that release, a full refresh will be required to ingest the newly supported tables.

General limitations​

Authentication​

Pipeline scheduling​

Archive retention​

Full refresh behavior​

ID field data types​

Schema changes​

System table support​