Selecione colunas a serem ingeridas
Aplica-se a : Criação de pipeline baseado em API
Em default, os conectores gerenciar em LakeFlow Connect ingerem todas as colunas atuais e futuras nas tabelas especificadas. Opcionalmente, use uma das seguintes propriedades de configuração de tabela em sua definição de pipeline para selecionar ou desmarcar colunas específicas para ingestão:
Propriedade | Descrição |
---|---|
| Opcionalmente, especifique uma lista de colunas a serem incluídas para ingestão. Se o senhor usar essa opção para incluir colunas explicitamente, o pipeline excluirá automaticamente as colunas que forem adicionadas à origem no futuro. Para ingerir as colunas futuras, você deve adicioná-las à lista. |
| Opcionalmente, especifique uma lista de colunas a serem excluídas da ingestão. Se o senhor usar essa opção para excluir explicitamente as colunas, o pipeline incluirá automaticamente as colunas que forem adicionadas à origem no futuro. |
Os exemplos de definições de pipeline nesta página mostram como selecionar três colunas específicas para ingestão, dependendo da interface de criação do pipeline. Em vez disso, para desmarcar colunas específicas, especifique exclude_columns
na configuração da tabela.
Exemplo: Google analítica
- Databricks Asset Bundles
- Databricks notebook
- Databricks CLI
resources:
pipelines:
pipeline_ga4:
name: <pipeline>
catalog: <target-catalog>
schema: <target-schema>
ingestion_definition:
connection_name: <connection>
objects:
- table:
source_url: <project-id>
source_schema: <property-name>
destination_catalog: <destination-catalog>
destination_schema: <destination-schema>
table_configuration:
include_columns:
- <column_a>
- <column_b>
- <column_c>
pipeline_spec = """
{
"name": "<pipeline>",
"ingestion_definition": {
"connection_name": "<connection>",
"objects": [
{
"table": {
"source_catalog": "<project-id>",
"source_schema": "<property-name>",
"source_table": "<source-table>",
"destination_catalog": "<target-catalog>",
"destination_schema": "<target-schema>",
"table_configuration": {
"include_columns": ["<column_a>", "<column_b>", "<column_c>"]
}
}
}
]
}
}
"""
{
"resources": {
"pipelines": {
"pipeline_ga4": {
"name": "<pipeline>",
"catalog": "<target-catalog>",
"schema": "<target-schema>",
"ingestion_definition": {
"connection_name": "<connection>",
"objects": [
{
"table": {
"source_url": "<project-id>",
"source_schema": "<property-name>",
"destination_catalog": "<destination-catalog>",
"destination_schema": "<destination-schema>",
"table_configuration": {
"include_columns": ["<column_a>", "<column_b>", "<column_c>"]
}
}
}
]
}
}
}
}
}
Exemplo: Salesforce
- Databricks Asset Bundles
- Databricks notebook
- Databricks CLI
resources:
pipelines:
pipeline_sfdc:
name: <pipeline>
catalog: <target-catalog>
schema: <target-schema>
ingestion_definition:
connection_name: <connection>
objects:
- table:
source_schema: <source-schema>
source_table: <source-table>
destination_catalog: <destination-catalog>
destination_schema: <destination-schema>
table_configuration:
include_columns:
- <column_a>
- <column_b>
- <column_c>
pipeline_spec = """
{
"name": "<pipeline>",
"ingestion_definition": {
"connection_name": "<connection>",
"objects": [
{
"table": {
"source_catalog": "<source-catalog>",
"source_schema": "<source-schema>",
"source_table": "<source-table>",
"destination_catalog": "<target-catalog>",
"destination_schema": "<target-schema>",
"table_configuration": {
"include_columns": ["<column_a>", "<column_b>", "<column_c>"]
}
}
}
]
}
}
"""
{
"resources": {
"pipelines": {
"pipeline_sfdc": {
"name": "<pipeline>",
"catalog": "<target-catalog>",
"schema": "<target-schema>",
"ingestion_definition": {
"connection_name": "<connection>",
"objects": [
{
"table": {
"source_schema": "<source-schema>",
"source_table": "<source-table>",
"destination_catalog": "<destination-catalog>",
"destination_schema": "<destination-schema>",
"table_configuration": {
"include_columns": ["<column_a>", "<column_b>", "<column_c>"]
}
}
}
]
}
}
}
}
}
Exemplo: Dia de trabalho
- Databricks Asset Bundles
- Databricks notebook
- Databricks CLI
resources:
pipelines:
pipeline_workday:
name: <pipeline>
catalog: <target-catalog>
schema: <target-schema>
ingestion_definition:
connection_name: <connection>
objects:
- report:
source_url: <report-url>
destination_catalog: <destination-catalog>
destination_schema: <destination-schema>
table_configuration:
include_columns:
- <column_a>
- <column_b>
- <column_c>
pipeline_spec = """
{
"name": "<pipeline>",
"ingestion_definition": {
"connection_name": "<connection>",
"objects": [
{
"report": {
"source_url": "<report-url>",
"destination_catalog": "<target-catalog>",
"destination_schema": "<target-schema>",
"table_configuration": {
"include_columns": ["<column_a>", "<column_b>", "<column_c>"]
}
}
}
]
}
}
"""
{
"resources": {
"pipelines": {
"pipeline_workday": {
"name": "<pipeline>",
"catalog": "<target-catalog>",
"schema": "<target-schema>",
"ingestion_definition": {
"connection_name": "<connection>",
"objects": [
{
"report": {
"source_url": "<report-url>",
"destination_catalog": "<destination-catalog>",
"destination_schema": "<destination-schema>",
"table_configuration": {
"include_columns": ["<column_a>", "<column_b>", "<column_c>"]
}
}
}
]
}
}
}
}
}