Veeva Vault connector limitations
This feature is in Beta. Workspace admins can control access to this feature from the Previews page. See Manage Databricks previews.
The Veeva Vault connector has the following limitations.
General limitations
- When you run a scheduled pipeline, alerts don't trigger immediately. Instead, they trigger when the next update runs.
- When a source table is deleted, the destination table is not automatically deleted. You must delete the destination table manually. This behavior is not consistent with Lakeflow Spark Declarative Pipelines behavior.
- During source maintenance periods, Databricks might not be able to access your data.
- If a source table name conflicts with an existing destination table name, the pipeline update fails.
- Multi-destination pipeline support is API-only.
- You can optionally rename a table that you ingest. If you rename a table in your pipeline, it becomes an API-only pipeline, and you can no longer edit the pipeline in the UI.
- If you select a column after a pipeline has already started, the connector does not automatically backfill data for the new column. To ingest historical data, manually run a full refresh on the table.
- Databricks can't ingest two or more tables with the same name in the same pipeline, even if they come from different source schemas.
- The source system assumes that the cursor columns are monotonically increasing.
- The connector ingests raw data without transformations. Use downstream Lakeflow Spark Declarative Pipelines pipelines for transformations.
Authentication
Only OAuth 2.0 Machine-to-Machine (M2M) authentication via an external OIDC identity provider (Microsoft Entra ID) is supported. Username and password authentication is not supported.
Pipeline scheduling
Veeva generates incremental archives every 15 minutes. Pipeline runs scheduled more frequently than every 15 minutes do not see new data.
Archive retention
Veeva retains incremental archives for 10 days and full archives for 2 days. If a pipeline falls more than 10 days behind, the incremental archive chain is broken and a full refresh is required.
Full refresh behavior
When a full refresh is triggered, the process spans two pipeline updates: the first update clears the staged archive state from the Unity Catalog volume, and the actual full data reload occurs in the subsequent update.
ID field data types
id fields are always stored as the STRING type in Databricks, regardless of the type declared in Veeva. This is required for Lakeflow Spark Declarative Pipelines's primary-key functionality to work correctly.
Schema changes
Databricks recommends performing a full refresh after schema changes in Veeva to ensure they are visible in your destination tables.
The connector handles schema changes as follows:
- Field deletion: The column remains in the destination table, but all values are set to
nulland are no longer queryable. - Field rename: Existing records are discoverable under the old field name. New records created after the rename appear under the new field name.
- Object deletion: Deleted objects remain discoverable in the schema.
- Object rename: The old object name remains in the schema. New records added under the new object name appear under the new table name.
System table support
The initial release supports ingestion from a fixed set of __sys tables:
DOCUMENT_VERSIONDOCUMENT_RELATIONSHIPPICKLISTWORKFLOWWORKFLOW_ITEMWORKFLOW_TASKWORKFLOW_TASK_ITEMACTIVE_LEGACY_WORKFLOWACTIVE_LEGACY_WORKFLOW_TASKINACTIVE_LEGACY_WORKFLOWINACTIVE_LEGACY_WORKFLOW_TASK
Other __sys tables are not ingested in this release. A subsequent release will expand support to all system tables available in your Vault. After that release, a full refresh will be required to ingest the newly supported tables.