Select columns to ingest

Applies to: API-based pipeline authoring SaaS connectors Database connectors

By default, managed connectors in Lakeflow Connect ingest all current and future columns in the specified tables. Optionally use one of the following table configuration properties in your pipeline definition to select or deselect specific columns for ingestion:

Property	Description
`include_columns`	Optionally specify a list of columns to include for ingestion. If you use this option to explicitly include columns, the pipeline automatically excludes columns that are added to the source in the future. To ingest the future columns, you must add them to the list.
`exclude_columns`	Optionally specify a list of columns to exclude from ingestion. If you use this option to explicitly exclude columns, the pipeline automatically includes columns that are added to the source in the future.

The example pipeline definitions on this page show how to select three specific columns for ingestion, depending on the pipeline creation interface. To deselect specific columns instead, specify exclude_columns in the table configuration.

Example: Google Analytics

Databricks Asset Bundles
Databricks notebook
Databricks CLI

YAML
resources:
  pipelines:
    pipeline_ga4:
      name: <pipeline>
      catalog: <target-catalog>
      schema: <target-schema>
      ingestion_definition:
        connection_name: <connection>
        objects:
          - table:
              source_url: <project-id>
              source_schema: <property-name>
              destination_catalog: <destination-catalog>
              destination_schema: <destination-schema>
              table_configuration:
                include_columns:
                  - <column_a>
                  - <column_b>
                  - <column_c>

Python
pipeline_spec = """
{
  "name": "<pipeline>",
  "ingestion_definition": {
    "connection_name": "<connection>",
    "objects": [
      {
        "table": {
          "source_catalog": "<project-id>",
          "source_schema": "<property-name>",
          "source_table": "<source-table>",
          "destination_catalog": "<target-catalog>",
          "destination_schema": "<target-schema>",
          "table_configuration": {
            "include_columns": ["<column_a>", "<column_b>", "<column_c>"]
          }
        }
      }
    ]
  }
}
"""

JSON
{
  "resources": {
    "pipelines": {
      "pipeline_ga4": {
        "name": "<pipeline>",
        "catalog": "<target-catalog>",
        "schema": "<target-schema>",
        "ingestion_definition": {
          "connection_name": "<connection>",
          "objects": [
            {
              "table": {
                "source_url": "<project-id>",
                "source_schema": "<property-name>",
                "destination_catalog": "<destination-catalog>",
                "destination_schema": "<destination-schema>",
                "table_configuration": {
                  "include_columns": ["<column_a>", "<column_b>", "<column_c>"]
                }
              }
            }
          ]
        }
      }
    }
  }
}

Example: Salesforce

Databricks Asset Bundles
Databricks notebook
Databricks CLI

YAML
resources:
  pipelines:
    pipeline_sfdc:
      name: <pipeline>
      catalog: <target-catalog>
      schema: <target-schema>
      ingestion_definition:
        connection_name: <connection>
        objects:
          - table:
              source_schema: <source-schema>
              source_table: <source-table>
              destination_catalog: <destination-catalog>
              destination_schema: <destination-schema>
              table_configuration:
                include_columns:
                  - <column_a>
                  - <column_b>
                  - <column_c>

Python
pipeline_spec = """
{
  "name": "<pipeline>",
  "ingestion_definition": {
    "connection_name": "<connection>",
    "objects": [
      {
        "table": {
          "source_catalog": "<source-catalog>",
          "source_schema": "<source-schema>",
          "source_table": "<source-table>",
          "destination_catalog": "<target-catalog>",
          "destination_schema": "<target-schema>",
          "table_configuration": {
            "include_columns": ["<column_a>", "<column_b>", "<column_c>"]
          }
        }
      }
    ]
  }
}
"""

JSON
{
  "resources": {
    "pipelines": {
      "pipeline_sfdc": {
        "name": "<pipeline>",
        "catalog": "<target-catalog>",
        "schema": "<target-schema>",
        "ingestion_definition": {
          "connection_name": "<connection>",
          "objects": [
            {
              "table": {
                "source_schema": "<source-schema>",
                "source_table": "<source-table>",
                "destination_catalog": "<destination-catalog>",
                "destination_schema": "<destination-schema>",
                "table_configuration": {
                  "include_columns": ["<column_a>", "<column_b>", "<column_c>"]
                }
              }
            }
          ]
        }
      }
    }
  }
}

Example: Workday

Databricks Asset Bundles
Databricks notebook
Databricks CLI

YAML
resources:
  pipelines:
    pipeline_workday:
      name: <pipeline>
      catalog: <target-catalog>
      schema: <target-schema>
      ingestion_definition:
        connection_name: <connection>
        objects:
          - report:
              source_url: <report-url>
              destination_catalog: <destination-catalog>
              destination_schema: <destination-schema>
              table_configuration:
                include_columns:
                  - <column_a>
                  - <column_b>
                  - <column_c>

Python
pipeline_spec = """
{
  "name": "<pipeline>",
  "ingestion_definition": {
    "connection_name": "<connection>",
    "objects": [
      {
        "report": {
          "source_url": "<report-url>",
          "destination_catalog": "<target-catalog>",
          "destination_schema": "<target-schema>",
          "table_configuration": {
            "include_columns": ["<column_a>", "<column_b>", "<column_c>"]
          }
        }
      }
    ]
  }
}
"""

JSON
{
  "resources": {
    "pipelines": {
      "pipeline_workday": {
        "name": "<pipeline>",
        "catalog": "<target-catalog>",
        "schema": "<target-schema>",
        "ingestion_definition": {
          "connection_name": "<connection>",
          "objects": [
            {
              "report": {
                "source_url": "<report-url>",
                "destination_catalog": "<destination-catalog>",
                "destination_schema": "<destination-schema>",
                "table_configuration": {
                  "include_columns": ["<column_a>", "<column_b>", "<column_c>"]
                }
              }
            }
          ]
        }
      }
    }
  }
}

Example: Google Analytics​

Example: Salesforce​

Example: Workday​

Example: Google Analytics

Example: Salesforce

Example: Workday