Load and process data incrementally with Lakeflow pipeline flows

Data is processed in pipelines through flows. Each flow consists of a query and, typically, a target. The flow processes the query, either as a batch, or incrementally as a stream of data into the target. A flow lives within a Lakeflow pipeline.

Typically, flows are defined automatically when you create a query in a pipeline that updates a target, but you can also explicitly define additional flows for more complex processing, such as appending to a single target from multiple sources.

Updates

A flow is run each time that its defining pipeline is updated. The flow will create or update tables with the most recent data available. Depending on the type of flow and the state of the changes to the data, the update may perform an incremental refresh, which processes only new records, or perform a full refresh, that reprocesses all records from the data source.

For more information about pipeline updates, see Run a pipeline update.
For more information about scheduling and triggering updates, see Triggered vs. continuous pipeline mode.

Default flows and append flows

When you create a query in a pipeline that updates a target, a default flow is defined automatically. For a streaming table, the default flow is an append flow that adds new rows with each update, and it has the same name as the target. Creating a flow and its target in a single step is the most common way to use pipelines, and you can use it to ingest or transform data.

You can also define flows separately from a target, which lets multiple flows append data to a single target. This is useful when you need to:

Add streaming sources that append to an existing streaming table without requiring a full refresh.
Backfill a streaming table with missing historical data.
Combine data from multiple sources without using a UNION clause.

For examples of creating default and explicit flows, see Use flows in Lakeflow pipelines.

Types of flows

The default flows for streaming tables and materialized views are append flows. You can also create flows to read from change data capture data sources. The following table describes the different types of flows.

Flow type	Description
Append	Append flows are the most common type of flow, where new records in the source are written to the target with each update. They correspond to append mode in structured streaming. You can add the `ONCE` flag, indicating a batch query whose data should be inserted into the target only once, unless the target is fully refreshed. Any number of append flows can write to a particular target. Default flows (created with the target streaming table or materialized view) will have the same name as the target. Other targets do not have default flows.
Auto CDC (previously apply changes)	An Auto CDC flow ingests a query containing change data capture (CDC) data. Auto CDC flows can only target streaming tables, and the source must be a streaming source (even in the case of `ONCE` flows). Multiple auto CDC flows can target a single streaming table. A streaming table that acts as a target for an auto CDC flow can only be targeted by other auto CDC flows. For more information about CDC data, see The AUTO CDC APIs: Simplify change data capture with pipelines.
Update (Public Preview)	Update flows output global, non-watermarked streaming aggregates to a sink, emitting only the records that changed in each batch. Update flows are only available in Python. See update_flow.

Flow type

Description

Append

Append flows are the most common type of flow, where new records in the source are written to the target with each update. They correspond to append mode in structured streaming. You can add the ONCE flag, indicating a batch query whose data should be inserted into the target only once, unless the target is fully refreshed. Any number of append flows can write to a particular target.

Default flows (created with the target streaming table or materialized view) will have the same name as the target. Other targets do not have default flows.

Auto CDC (previously apply changes)

An Auto CDC flow ingests a query containing change data capture (CDC) data. Auto CDC flows can only target streaming tables, and the source must be a streaming source (even in the case of ONCE flows). Multiple auto CDC flows can target a single streaming table. A streaming table that acts as a target for an auto CDC flow can only be targeted by other auto CDC flows.

For more information about CDC data, see The AUTO CDC APIs: Simplify change data capture with pipelines.

Update (Public Preview)

Update flows output global, non-watermarked streaming aggregates to a sink, emitting only the records that changed in each batch.

Update flows are only available in Python. See update_flow.

Flow type	Description
Append	Append flows are the most common type of flow, where new records in the source are written to the target with each update. They correspond to append mode in structured streaming. You can add the `ONCE` flag, indicating a batch query whose data should be inserted into the target only once, unless the target is fully refreshed. Any number of append flows can write to a particular target. Default flows (created with the target streaming table or materialized view) will have the same name as the target. Other targets do not have default flows.
Auto CDC (previously apply changes)	An Auto CDC flow ingests a query containing change data capture (CDC) data. Auto CDC flows can only target streaming tables, and the source must be a streaming source (even in the case of `ONCE` flows). Multiple auto CDC flows can target a single streaming table. A streaming table that acts as a target for an auto CDC flow can only be targeted by other auto CDC flows. For more information about CDC data, see The AUTO CDC APIs: Simplify change data capture with pipelines.
Update (Public Preview)	Update flows output global, non-watermarked streaming aggregates to a sink, emitting only the records that changed in each batch. Update flows are only available in Python. See update_flow.

Flow type

Description

Append

Default flows (created with the target streaming table or materialized view) will have the same name as the target. Other targets do not have default flows.

Auto CDC (previously apply changes)

For more information about CDC data, see The AUTO CDC APIs: Simplify change data capture with pipelines.

Update (Public Preview)

Update flows output global, non-watermarked streaming aggregates to a sink, emitting only the records that changed in each batch.

Update flows are only available in Python. See update_flow.

Additional resources

For more information on flows and their use, see the following topics:

Updates​

Default flows and append flows​

Types of flows​

Additional resources​

Updates

Default flows and append flows

Types of flows

Additional resources