Maintain SQL Server ingestion pipelines

Preview

The Microsoft SQL Server connector is in Public Preview.

This page describes ongoing operations for maintaining SQL Server ingestion pipelines.

General pipeline maintenance

The pipeline maintenance tasks in this section apply to all managed connectors in Lakeflow Connect.

Fully refresh target tables

Fully refreshing your ingestion pipeline clears the data and state of your table, then reprocesses all records from the data source.

To fully refresh selected tables:

a. In the sidebar of the Databricks workspace, click Pipelines. a. Select the pipeline. a. On the pipeline details page, click Select tables for refresh for the ingestion pipeline. a. Select the desired tables, and then click Full refresh selection.

To fully refresh all tables in the ingestion pipeline instead, click the drop-down menu next to the Start button, and then click Full refresh all.

important

The ingestion pipeline update might fail during the Initializing or the Resetting tables phase. Lakeflow Connect will retry the pipeline automatically several times. If the automatic retries are manually interrupted or eventually fail fatally, start the new pipeline update manually with the table refresh selection from before. Failing to do so can result in the target tables being left in an inconsistent state with partial data. If manual retries also fail, create a support ticket.

Change the ingestion pipeline schedule

In the sidebar of the Databricks workspace, click Pipelines.
Select the pipeline, and then click Schedule.

Customize alerts and notifications

Lakeflow Connect automatically sets up notifications for all ingestion pipelines and scheduling jobs. You can customize notifications in the UI or using the Pipelines API.

In the left-hand panel, click Pipelines.
Select your pipeline.
Click Schedule.
If you already have a schedule that you want to receive notifications for: a. Identify the schedule on the list. a. Click the kebab menu, and then click Edit. a. Click More options, and then add your notifications.
If you need a new schedule: a. Click Add schedule. a. Configure your schedule. a. Click More options, and then add your notifications.

Specify tables to ingest

The Pipelines API provides two methods to specify tables to ingest in the objects field of the ingestion_definition:

Table specification: Ingests an individual table from the specified source catalog and schema to the specified destination catalog and schema.
Schema specification: Ingests all tables from the specified source catalog and schema into the specified catalog and schema.

If you choose to ingest an entire schema, you should review the limitations on the number of tables per pipeline for your connector.

Connector-specific pipeline maintenance

The pipeline maintenance tasks in this section are specific to the SQL Server connector.

Remove unused staging files

For ingestion pipelines created after January 6, 2025, volume staging data is automatically scheduled for deletion after 25 days and physically removed after 30 days. An ingestion pipeline that has not completed successfully for 25 days or longer might result in data gaps in the destination tables. To avoid gaps, you must trigger a full refresh of the target tables.

For ingestion pipelines created before January 6, 2025, contact Databricks Support to request manual enablement of automatic retention management for staging CDC data.

The following data is automatically cleaned up:

CDC data files
Snapshot files
Staging table data

Restart the ingestion gateway

To decrease the load on the source database, the ingestion gateway only checks periodically for new tables. It might take up to 6 hours for new tables to be discovered. If you want to speed up this process, restart the gateway.

General pipeline maintenance​

Fully refresh target tables​

Change the ingestion pipeline schedule​

Customize alerts and notifications​

Specify tables to ingest​

Connector-specific pipeline maintenance​

Remove unused staging files​

Restart the ingestion gateway​