Smart closure for integrated CDC pipelines
Integrated CDC pipelines run in triggered mode. Each update extracts changes from the source database, applies them to the destination tables, and then stops. Smart closure is the policy that decides when an update has done enough work to stop. Instead of running every update for a fixed amount of time, smart closure lets the duration of each update adapt to how much change data the source has.
Smart closure applies to integrated CDC pipelines (also called direct CDC), which run the change extractor and the applier together in a single triggered pipeline rather than as separate components. It does not apply to the standard gateway-based architecture, where the ingestion gateway runs continuously. See Create an integrated CDC pipeline for SQL Server.
When an update stops
An update stops when either of the following conditions is met:
Condition | Completion reason | What it means |
|---|---|---|
Caught up with the source |
| The update has applied the pending change backlog and is close to current with the source. This is the common case for incremental updates that have little to process. |
Reached the runtime limit |
| The source had a large backlog (for example, because the pipeline was not run for a long duration), so the update stopped after a bounded runtime. The next scheduled update resumes where this one stopped. |
How smart closure helps
- Lower cost and faster updates when there's little change: An update ends as soon as it has caught up instead of running for a fixed duration, so incremental updates use less compute time.
- Bounded, predictable runtime: A large backlog can't make a single update run indefinitely. Each update is capped, and large workloads are spread across subsequent scheduled updates.
- Visibility into completion: Each update records why it ended, so you can tell whether it caught up with the source or stopped at the runtime limit.
Observe update completion
The completion reason appears in the message of the COMPLETED event in the pipeline event log. An update that caught up with the source completes with reason lag-converged, and an update that stopped at the runtime limit completes with reason max-runtime-cap-hit.
To find the completion reason, query the pipeline event log for the extractor's COMPLETED event. Replace <pipeline-id> with your pipeline ID:
SELECT timestamp, message
FROM event_log('<pipeline-id>')
WHERE message LIKE '%Direct Cdc Extraction has COMPLETED%'
ORDER BY timestamp DESC
The event message embeds the reason, for example Direct Cdc Extraction has COMPLETED (reason=lag-converged).
Schedule recurring updates
Because update duration varies with how much change data the source has, a large backlog might not finish in a single update. To ingest data on a recurring schedule, create a Lakeflow Jobs task that runs the pipeline. Schedule it frequently enough for subsequent updates to catch up. A starting point of 60 minutes works well for most workloads. If a trigger fires while a previous update is still running, the pipeline queues the new update.