Smart closure for integrated CDC pipelines

Integrated CDC pipelines run in triggered mode. Each update extracts changes from the source database, applies them to the destination tables, and then stops. Smart closure is the policy that decides when an update has done enough work to stop. Instead of running every update for a fixed amount of time, smart closure lets the duration of each update adapt to how much change data the source has.

note

Smart closure applies to integrated CDC pipelines (also called direct CDC), which run the change extractor and the applier together in a single triggered pipeline rather than as separate components. It does not apply to the standard gateway-based architecture, where the ingestion gateway runs continuously. See Create an integrated CDC pipeline for SQL Server.

When an update stops

An update stops when either of the following conditions is met:

Condition	Completion reason	What it means
Caught up with the source	`lag-converged`	The update has applied the pending change backlog and is close to current with the source. This is the common case for incremental updates that have little to process.
Reached the runtime limit	`max-runtime-cap-hit`	The source had a large backlog (for example, because the pipeline was not run for a long duration), so the update stopped after a bounded runtime. The next scheduled update resumes where this one stopped.

Condition	Completion reason	What it means
Caught up with the source	`lag-converged`	The update has applied the pending change backlog and is close to current with the source. This is the common case for incremental updates that have little to process.
Reached the runtime limit	`max-runtime-cap-hit`	The source had a large backlog (for example, because the pipeline was not run for a long duration), so the update stopped after a bounded runtime. The next scheduled update resumes where this one stopped.

How smart closure helps

Lower cost and faster updates when there's little change: An update ends as soon as it has caught up instead of running for a fixed duration, so incremental updates use less compute time.
Bounded, predictable runtime: A large backlog can't make a single update run indefinitely. Each update is capped, and large workloads are spread across subsequent scheduled updates.
Visibility into completion: Each update records why it ended, so you can tell whether it caught up with the source or stopped at the runtime limit.

Observe update completion

The completion reason appears in the message of the COMPLETED event in the pipeline event log. An update that caught up with the source completes with reason lag-converged, and an update that stopped at the runtime limit completes with reason max-runtime-cap-hit.

To find the completion reason, query the pipeline event log for the extractor's COMPLETED event. Replace <pipeline-id> with your pipeline ID:

SQL
SELECT timestamp, message
FROM event_log('<pipeline-id>')
WHERE message LIKE '%Direct Cdc Extraction has COMPLETED%'
ORDER BY timestamp DESC

The event message embeds the reason, for example Direct Cdc Extraction has COMPLETED (reason=lag-converged).

Schedule recurring updates

Because update duration varies with how much change data the source has, a large backlog might not finish in a single update. To ingest data on a recurring schedule, create a Lakeflow Jobs task that runs the pipeline. Schedule it frequently enough for subsequent updates to catch up. A starting point of 60 minutes works well for most workloads. If a trigger fires while a previous update is still running, the pipeline queues the new update.

When an update stops​

How smart closure helps​

Observe update completion​

Schedule recurring updates​

Related resources​

When an update stops

How smart closure helps

Observe update completion

Schedule recurring updates

Related resources