Skip to main content

Select columns to ingest

Applies to: check marked yes API-based pipeline authoring

By default, managed connectors in Lakeflow Connect ingest all current and future columns in the specified tables. Optionally use one of the following table configuration properties in your pipeline definition to select or deselect specific columns for ingestion:

Property

Description

include_columns

Optionally specify a list of columns to include for ingestion. If you use this option to explicitly include columns, the pipeline automatically excludes columns that are added to the source in the future. To ingest the future columns, you must add them to the list.

exclude_columns

Optionally specify a list of columns to exclude from ingestion. If you use this option to explicitly exclude columns, the pipeline automatically includes columns that are added to the source in the future.

The example pipeline definitions on this page show how to select three specific columns for ingestion, depending on the pipeline creation interface. To deselect specific columns instead, specify exclude_columns in the table configuration.

Example: Google Analytics

YAML
resources:
pipelines:
pipeline_ga4:
name: <pipeline>
catalog: <target-catalog>
schema: <target-schema>
ingestion_definition:
connection_name: <connection>
objects:
- table:
source_url: <project-id>
source_schema: <property-name>
destination_catalog: <destination-catalog>
destination_schema: <destination-schema>
table_configuration:
include_columns:
- <column_a>
- <column_b>
- <column_c>

Example: Salesforce

YAML
resources:
pipelines:
pipeline_sfdc:
name: <pipeline>
catalog: <target-catalog>
schema: <target-schema>
ingestion_definition:
connection_name: <connection>
objects:
- table:
source_schema: <source-schema>
source_table: <source-table>
destination_catalog: <destination-catalog>
destination_schema: <destination-schema>
table_configuration:
include_columns:
- <column_a>
- <column_b>
- <column_c>

Example: Workday

YAML
resources:
pipelines:
pipeline_workday:
name: <pipeline>
catalog: <target-catalog>
schema: <target-schema>
ingestion_definition:
connection_name: <connection>
objects:
- report:
source_url: <report-url>
destination_catalog: <destination-catalog>
destination_schema: <destination-schema>
table_configuration:
include_columns:
- <column_a>
- <column_b>
- <column_c>