Select columns to ingest
Applies to: API-based pipeline authoring
By default, managed connectors in Lakeflow Connect ingest all current and future columns in the specified tables. Optionally use one of the following table configuration properties in your pipeline definition to select or deselect specific columns for ingestion:
Property | Description |
---|---|
| Optionally specify a list of columns to include for ingestion. If you use this option to explicitly include columns, the pipeline automatically excludes columns that are added to the source in the future. To ingest the future columns, you must add them to the list. |
| Optionally specify a list of columns to exclude from ingestion. If you use this option to explicitly exclude columns, the pipeline automatically includes columns that are added to the source in the future. |
The example pipeline definitions on this page show how to select three specific columns for ingestion, depending on the pipeline creation interface. To deselect specific columns instead, specify exclude_columns
in the table configuration.
Example: Google Analytics
- Databricks Asset Bundles
- Databricks notebook
- Databricks CLI
resources:
pipelines:
pipeline_ga4:
name: <pipeline>
catalog: <target-catalog>
schema: <target-schema>
ingestion_definition:
connection_name: <connection>
objects:
- table:
source_url: <project-id>
source_schema: <property-name>
destination_catalog: <destination-catalog>
destination_schema: <destination-schema>
table_configuration:
include_columns:
- <column_a>
- <column_b>
- <column_c>
pipeline_spec = """
{
"name": "<pipeline>",
"ingestion_definition": {
"connection_name": "<connection>",
"objects": [
{
"table": {
"source_catalog": "<project-id>",
"source_schema": "<property-name>",
"source_table": "<source-table>",
"destination_catalog": "<target-catalog>",
"destination_schema": "<target-schema>",
"table_configuration": {
"include_columns": ["<column_a>", "<column_b>", "<column_c>"]
}
}
}
]
}
}
"""
{
"resources": {
"pipelines": {
"pipeline_ga4": {
"name": "<pipeline>",
"catalog": "<target-catalog>",
"schema": "<target-schema>",
"ingestion_definition": {
"connection_name": "<connection>",
"objects": [
{
"table": {
"source_url": "<project-id>",
"source_schema": "<property-name>",
"destination_catalog": "<destination-catalog>",
"destination_schema": "<destination-schema>",
"table_configuration": {
"include_columns": ["<column_a>", "<column_b>", "<column_c>"]
}
}
}
]
}
}
}
}
}
Example: Salesforce
- Databricks Asset Bundles
- Databricks notebook
- Databricks CLI
resources:
pipelines:
pipeline_sfdc:
name: <pipeline>
catalog: <target-catalog>
schema: <target-schema>
ingestion_definition:
connection_name: <connection>
objects:
- table:
source_schema: <source-schema>
source_table: <source-table>
destination_catalog: <destination-catalog>
destination_schema: <destination-schema>
table_configuration:
include_columns:
- <column_a>
- <column_b>
- <column_c>
pipeline_spec = """
{
"name": "<pipeline>",
"ingestion_definition": {
"connection_name": "<connection>",
"objects": [
{
"table": {
"source_catalog": "<source-catalog>",
"source_schema": "<source-schema>",
"source_table": "<source-table>",
"destination_catalog": "<target-catalog>",
"destination_schema": "<target-schema>",
"table_configuration": {
"include_columns": ["<column_a>", "<column_b>", "<column_c>"]
}
}
}
]
}
}
"""
{
"resources": {
"pipelines": {
"pipeline_sfdc": {
"name": "<pipeline>",
"catalog": "<target-catalog>",
"schema": "<target-schema>",
"ingestion_definition": {
"connection_name": "<connection>",
"objects": [
{
"table": {
"source_schema": "<source-schema>",
"source_table": "<source-table>",
"destination_catalog": "<destination-catalog>",
"destination_schema": "<destination-schema>",
"table_configuration": {
"include_columns": ["<column_a>", "<column_b>", "<column_c>"]
}
}
}
]
}
}
}
}
}
Example: Workday
- Databricks Asset Bundles
- Databricks notebook
- Databricks CLI
resources:
pipelines:
pipeline_workday:
name: <pipeline>
catalog: <target-catalog>
schema: <target-schema>
ingestion_definition:
connection_name: <connection>
objects:
- report:
source_url: <report-url>
destination_catalog: <destination-catalog>
destination_schema: <destination-schema>
table_configuration:
include_columns:
- <column_a>
- <column_b>
- <column_c>
pipeline_spec = """
{
"name": "<pipeline>",
"ingestion_definition": {
"connection_name": "<connection>",
"objects": [
{
"report": {
"source_url": "<report-url>",
"destination_catalog": "<target-catalog>",
"destination_schema": "<target-schema>",
"table_configuration": {
"include_columns": ["<column_a>", "<column_b>", "<column_c>"]
}
}
}
]
}
}
"""
{
"resources": {
"pipelines": {
"pipeline_workday": {
"name": "<pipeline>",
"catalog": "<target-catalog>",
"schema": "<target-schema>",
"ingestion_definition": {
"connection_name": "<connection>",
"objects": [
{
"report": {
"source_url": "<report-url>",
"destination_catalog": "<destination-catalog>",
"destination_schema": "<destination-schema>",
"table_configuration": {
"include_columns": ["<column_a>", "<column_b>", "<column_c>"]
}
}
}
]
}
}
}
}
}