Create multi-destination pipelines

Applies to: API-based pipeline authoring SaaS connectors Database connectors

Using managed ingestion connectors in Lakeflow Connect, you can write to multiple destination catalogs and schemas from one pipeline. You can also ingest multiples of an object into the same schema. However, managed connectors don't support duplicate table names in the same destination schema, so you must specify a new name for one of the tables to differentiate between them. See Name a destination table.

Example: Ingest two objects into different schemas

The example pipeline definitions in this section show how to ingest two objects into different schemas, depending on the pipeline creation interface and the source system.

Google Analytics

Databricks Asset Bundles
Databricks notebook
Databricks CLI

The following is an example YAML file that you can use in your bundles:

YAML
resources:
  pipelines:
    pipeline_ga4:
      name: <pipeline>
      catalog: <target-catalog-1> # Location of the pipeline event log
      schema: <target-schema-1> # Location of the pipeline event log
      ingestion_definition:
        connection_name: <connection>
        objects:
          - table:
              source_url: <project-1-id>
              source_schema: <property-name>
              destination_catalog: <target-catalog-1>
              destination_schema: <target-schema-1>
          - table:
              source_url: <project-2-id>
              source_schema: <property-name>
              destination_catalog: <target-catalog-2>
              destination_schema: <target-schema-2>

The following is an example Python pipeline spec that you can use in your notebook:

Python
pipeline_spec = """
{
  "name": "<pipeline>",
  "ingestion_definition": {
    "connection_name": "<connection>",
    "objects": [
      {
        "table": {
          "source_catalog": "<project-1-id>",
          "source_schema": "<property-1-name>",
          "destination_catalog": "<target-catalog-1>",
          "destination_schema": "<target-schema-1>",
        },
        "table": {
          "source_catalog": "<project-2-id>",
          "source_schema": "<property-2-name>",
          "destination_catalog": "<target-catalog-2>",
          "destination_schema": "<target-schema-2>",
        }
      }
    ]
  }
}
"""

The following is an example JSON pipeline definition that you can use with CLI commands:

JSON
{
  "resources": {
    "pipelines": {
      "pipeline_ga4": {
        "name": "<pipeline>",
        "catalog": "<target-catalog-1>",
        "schema": "<target-schema-1>",
        "ingestion_definition": {
          "connection_name": "<connection>",
          "objects": [
            {
              "table": {
                "source_url": "<project-1-id>",
                "source_schema": "<property-1-name>",
                "destination_catalog": "<target-catalog-1>",
                "destination_schema": "<target-schema-1>"
              },
              "table": {
                "source_url": "<project-2-id>",
                "source_schema": "<property-2-name>",
                "destination_catalog": "<target-catalog-2>",
                "destination_schema": "<target-schema-2>"
              }
            }
          ]
        }
      }
    }
  }
}

MySQL

Databricks Asset Bundles
Databricks notebook
Databricks CLI

The following is an example YAML resource file that you can use in your bundle:

YAML
resources:
  pipelines:
    gateway:
      name: <gateway-name>
      gateway_definition:
        connection_id: <connection-id>
        gateway_storage_catalog: <destination-catalog>
        gateway_storage_schema: <destination-schema>
        gateway_storage_name: <destination-schema>
      target: <destination-schema>
      catalog: <destination-catalog>

    pipeline_mysql:
      name: <pipeline-name>
      catalog: <target-catalog-1> # Location of the pipeline event log
      schema: <target-schema-1> # Location of the pipeline event log
      ingestion_definition:
        ingestion_gateway_id: ${resources.pipelines.gateway.id}
        objects:
          - table:
              source_schema: <source-schema-1>
              source_table: <source-table-1>
              destination_catalog: <target-catalog-1> # Location of this table
              destination_schema: <target-schema-1> # Location of this table
          - table:
              source_schema: <source-schema-2>
              source_table: <source-table-2>
              destination_catalog: <target-catalog-2> # Location of this table
              destination_schema: <target-schema-2> # Location of this table

The following are example ingestion gateway and ingestion pipeline specs that you can use in a Python notebook:

Python
gateway_pipeline_spec = {
  "pipeline_type": "INGESTION_GATEWAY",
  "name": <gateway-name>,
  "catalog": <destination-catalog>,
  "target": <destination-schema>,
  "gateway_definition": {
    "connection_id": <connection-id>,
    "gateway_storage_catalog": <destination-catalog>,
    "gateway_storage_schema": <destination-schema>,
    "gateway_storage_name": <destination-schema>
    }
}

ingestion_pipeline_spec = {
  "pipeline_type": "MANAGED_INGESTION",
  "name": <pipeline-name>,
  "ingestion_definition": {
    "ingestion_gateway_id": <gateway-pipeline-id>,
    "source_type": "MYSQL",
    "objects": [
      {
        "table": {
          "source_schema": "<source-schema-1>",
          "source_table": "<source-table-1>",
          "destination_catalog": "<destination-catalog-1>",
          "destination_schema": "<destination-schema-1>",
        },
        "table": {
          "source_catalog": "<source-catalog-2>",
          "source_schema": "<source-schema-2>",
          "source_table": "<source-table-2>",
          "destination_catalog": "<destination-catalog-2>",
          "destination_schema": "<destination-schema-2>",
        }
      }
    ]
  }
}

To create the ingestion gateway using the Databricks CLI:

databricks pipelines create --json '{
"name": "'"<gateway-name>"'",
"gateway_definition": {
  "connection_id": "'"<connection-id>"'",
  "gateway_storage_catalog": "'"<staging-catalog>"'",
  "gateway_storage_schema": "'"<staging-schema>"'",
  "gateway_storage_name": "'"<gateway-name>"'"
  }
}'

To create the ingestion pipeline using the Databricks CLI:

databricks pipelines create --json '{
"name": "'"<pipeline-name>"'",
"ingestion_definition": {
  "ingestion_gateway_id": "'"<gateway-id>"'",
  "objects": [
    {"table": {
        "source_schema": "<source-schema-1>",
        "source_table": "<source-table-1>",
        "destination_catalog": "'"<destination-catalog-1>"'",
        "destination_schema": "'"<destination-schema-1>"'"
        }},
    {"table": {
        "source_schema": "<source-schema-2>",
        "source_table": "<source-table-2>",
        "destination_catalog": "'"<destination-catalog-2>"'",
        "destination_schema": "'"<destination-schema-2>"'"
        }}
    ]
  }
}'

Salesforce

Databricks Asset Bundles
Databricks notebook
Databricks CLI

The following is an example YAML file that you can use in your bundles:

YAML
resources:
  pipelines:
    pipeline_sfdc:
      name: <pipeline>
      catalog: <target-catalog-1> # Location of the pipeline event log
      schema: <target-schema-1> # Location of the pipeline event log
      ingestion_definition:
        connection_name: <connection>
        objects:
          - table:
              source_schema: <source-schema-1>
              source_table: <source-table-1>
              destination_catalog: <target-catalog-1> # Location of this table
              destination_schema: <target-schema-1> # Location of this table
          - table:
              source_schema: <source-schema-2>
              source_table: <source-table-2>
              destination_catalog: <target-catalog-2> # Location of this table
              destination_schema: <target-schema-2> # Location of this table

The following is an example Python pipeline spec that you can use in your notebook:

Python
pipeline_spec = """
{
  "name": "<pipeline>",
  "ingestion_definition": {
    "connection_name": "<connection>",
    "objects": [
      {
        "table": {
          "source_schema": "<source-schema-1>",
          "source_table": "<source-table-1>",
          "destination_catalog": "<target-catalog-1>",
          "destination_schema": "<target-schema-1>",
        },
        "table": {
          "source_schema": "<source-schema-2>",
          "source_table": "<source-table-2>",
          "destination_catalog": "<target-catalog-2>",
          "destination_schema": "<target-schema-2>",
        }
      }
    ]
  }
}
"""

The following is an example JSON pipeline definition that you can use with CLI commands:

JSON
{
  "resources": {
    "pipelines": {
      "pipeline_sfdc": {
        "name": "<pipeline>",
        "catalog": "<target-catalog-1>",
        "schema": "<target-schema-1>",
        "ingestion_definition": {
          "connection_name": "<connection>",
          "objects": [
            {
              "table": {
                "source_schema": "<source-schema-1>",
                "source_table": "<source-table-1>",
                "destination_catalog": "<target-catalog-1>",
                "destination_schema": "<target-schema-1>"
              },
              "table": {
                "source_schema": "<source-schema-2>",
                "source_table": "<source-table-2>",
                "destination_catalog": "<target-catalog-2>",
                "destination_schema": "<target-schema-2>"
              }
            }
          ]
        }
      }
    }
  }
}

SQL Server

Databricks Asset Bundles
Databricks notebook
Databricks CLI

The following is an example YAML resource file that you can use in your bundle:

YAML
resources:
  pipelines:
    gateway:
      name: <gateway-name>
      gateway_definition:
        connection_id: <connection-id>
        gateway_storage_catalog: <destination-catalog>
        gateway_storage_schema: <destination-schema>
        gateway_storage_name: <destination-schema>
      target: <destination-schema>
      catalog: <destination-catalog>

    pipeline_sqlserver:
      name: <pipeline-name>
      catalog: <target-catalog-1> # Location of the pipeline event log
      schema: <target-schema-1> # Location of the pipeline event log
      ingestion_definition:
        connection_name: <connection>
        objects:
          - table:
              source_schema: <source-schema-1>
              source_table: <source-table-1>
              destination_catalog: <target-catalog-1> # Location of this table
              destination_schema: <target-schema-1> # Location of this table
          - table:
              source_schema: <source-schema-2>
              source_table: <source-table-2>
              destination_catalog: <target-catalog-2> # Location of this table
              destination_schema: <target-schema-2> # Location of this table

The following are example ingestion gateway and ingestion pipeline specs that you can use in a Python notebook:

Python
gateway_pipeline_spec = {
  "pipeline_type": "INGESTION_GATEWAY",
  "name": <gateway-name>,
  "catalog": <destination-catalog>,
  "target": <destination-schema>,
  "gateway_definition": {
    "connection_id": <connection-id>,
    "gateway_storage_catalog": <destination-catalog>,
    "gateway_storage_schema": <destination-schema>,
    "gateway_storage_name": <destination-schema>
    }
}

ingestion_pipeline_spec = {
  "pipeline_type": "MANAGED_INGESTION",
  "name": <pipeline-name>,
  "ingestion_definition": {
    "ingestion_gateway_id": <gateway-pipeline-id>,
    "source_type": "SQLSERVER",
    "objects": [
      {
        "table": {
          "source_schema": "<source-schema-1>",
          "source_table": "<source-table-1>",
          "destination_catalog": "<destination-catalog-1>",
          "destination_schema": "<destination-schema-1>",
        },
        "table": {
          "source_schema": "<source-schema-2>",
          "source_table": "<source-table-2>",
          "destination_catalog": "<destination-catalog-2>",
          "destination_schema": "<destination-schema-2>",
        }
      }
    ]
  }
}

To create the ingestion gateway using the Databricks CLI:

databricks pipelines create --json '{
"name": "'"<gateway-name>"'",
"gateway_definition": {
  "connection_id": "'"<connection-id>"'",
  "gateway_storage_catalog": "'"<staging-catalog>"'",
  "gateway_storage_schema": "'"<staging-schema>"'",
  "gateway_storage_name": "'"<gateway-name>"'"
  }
}'

To create the ingestion pipeline using the Databricks CLI:

databricks pipelines create --json '{
"name": "'"<pipeline-name>"'",
"ingestion_definition": {
  "ingestion_gateway_id": "'"<gateway-id>"'",
  "objects": [
    {"table": {
        "source_catalog": "<source-catalog>",
        "source_schema": "<source-schema>",
        "source_table": "<source-table>",
        "destination_catalog": "'"<destination-catalog-1>"'",
        "destination_schema": "'"<destination-schema-1>"'"
        }},
    {"table": {
        "source_catalog": "<source-catalog>",
        "source_schema": "<source-schema>",
        "source_table": "<source-table>",
        "destination_catalog": "'"<destination-catalog-2>"'",
        "destination_schema": "'"<destination-schema-2>"'"
        }}
    ]
  }
}'

Workday

Databricks Asset Bundles
Databricks notebook
Databricks CLI

The following is an example YAML file that you can use in your bundles:

YAML
resources:
  pipelines:
    pipeline_workday:
      name: <pipeline>
      catalog: <target-catalog-1> # Location of the pipeline event log
      schema: <target-schema-1> # Location of the pipeline event log
      ingestion_definition:
        connection_name: <connection>
        objects:
          - report:
              source_url: <report-url-1>
              destination_catalog: <target-catalog-1>
              destination_schema: <target-schema-1>
          - report:
              source_url: <report-url-2>
              destination_catalog: <target-catalog-2>
              destination_schema: <target-schema-2>

The following is an example Python pipeline spec that you can use in your notebook:

Python
pipeline_spec = """
{
  "name": "<pipeline>",
  "ingestion_definition": {
    "connection_name": "<connection>",
    "objects": [
      {
        "report": {
          "source_url": "<report-url-1>",
          "destination_catalog": "<target-catalog-1>",
          "destination_schema": "<target-schema-1>",
        },
        "report": {
          "source_url": "<report-url-2>",
          "destination_catalog": "<target-catalog-2>",
          "destination_schema": "<target-schema-2>",
        }
      }
    ]
  }
}
"""

The following is an example JSON pipeline definition that you can use with CLI commands:

JSON
{
  "resources": {
    "pipelines": {
      "pipeline_workday": {
        "name": "<pipeline>",
        "catalog": "<target-catalog-1>",
        "schema": "<target-schema-1>",
        "ingestion_definition": {
          "connection_name": "<connection>",
          "objects": [
            {
              "report": {
                "source_url": "<report-url-1>",
                "destination_catalog": "<target-catalog-1>",
                "destination_schema": "<target-schema-1>"
              },
              "report": {
                "source_url": "<report-url-2>",
                "destination_catalog": "<target-catalog-2>",
                "destination_schema": "<target-schema-2>"
              }
            }
          ]
        }
      }
    }
  }
}

Example: Ingest one object three times

The following example pipeline definition shows how to ingest an object into three different destination tables. In the example, the third target table is given a unique name to differentiate when an object is ingested into the same destination schema twice (duplicates aren't supported).

Google Analytics

Databricks Asset Bundles
Databricks notebook
Databricks CLI

The following is an example YAML file that you can use in your bundles:

YAML
resources:
  pipelines:
    pipeline_sfdc:
      name: <pipeline-name>
      catalog: <target-catalog-1> # Location of the pipeline event log
      schema: <target-schema-1>	# Location of the pipeline event log
      ingestion_definition:
        connection_name: <connection>
        objects:
          - table:
              source_url: <project-id>
              source_schema: <property-name>
              destination_catalog: <target-catalog-1> # Location of first copy
              destination_schema: <target-schema-1>	# Location of first copy
          - table:
              source_url: <project-id>
              source_schema: <property-name>
              destination_catalog: <target-catalog-2> # Location of second copy
              destination_schema: <target-schema-2> # Location of second copy
	      - table:
              source_url: <project-id>
              source_schema: <property-name>
              destination_catalog: <target-catalog-2> # Location of third copy
              destination_schema: <target-schema-2> # Location of third copy
              destination_table: <custom-target-table-name> # Specify destination table name

Python
pipeline_spec = """
{
  "name": "<pipeline>",
  "ingestion_definition": {
    "connection_name": "<connection>",
    "objects": [
      {
        "table": {
          "source_catalog": "<project-id>",
          "source_schema": "<property-name>",
          "destination_catalog": "<target-catalog-1>",
          "destination_schema": "<target-schema-1>",
        },
        "table": {
          "source_catalog": "<project-id>",
          "source_schema": "<property-name>",
          "destination_catalog": "<target-catalog-2>",
          "destination_schema": "<target-schema-2>",
        },
        "table": {
          "source_catalog": "<project-id>",
          "source_schema": "<property-name>",
          "destination_catalog": "<target-catalog-2>",
          "destination_schema": "<target-schema-2>",
          "destination_table": "<custom-target-table-name>",
        },
      }
    ]
  }
}
"""

The following is an example JSON pipeline definition that you can use with CLI commands:

JSON
{
  "resources": {
    "pipelines": {
      "pipeline_ga4": {
        "name": "<pipeline>",
        "catalog": "<target-catalog-1>",
        "schema": "<target-schema-1>",
        "ingestion_definition": {
          "connection_name": "<connection>",
          "objects": [
            {
              "table": {
                "source_url": "<project-id>",
                "source_schema": "<property-name>",
                "destination_catalog": "<target-catalog-1>",
                "destination_schema": "<target-schema-1>"
              },
              "table": {
                "source_url": "<project-id>",
                "source_schema": "<property-name>",
                "destination_catalog": "<target-catalog-2>",
                "destination_schema": "<target-schema-2>"
              },
              "table": {
                "source_url": "<project-id>",
                "source_schema": "<property-name>",
                "destination_catalog": "<target-catalog-2>",
                "destination_schema": "<target-schema-2>",
                "destination_table": "<custom-target-table-name>"
              }
            }
          ]
        }
      }
    }
  }
}

MySQL

Databricks Asset Bundles
Databricks notebook
Databricks CLI

The following is an example YAML resource file that you can use in your bundle:

YAML
resources:
  pipelines:
    gateway:
      name: <gateway-name>
      gateway_definition:
        connection_id: <connection-id>
        gateway_storage_catalog: <destination-catalog>
        gateway_storage_schema: <destination-schema>
        gateway_storage_name: <destination-schema-name>
      target: <destination-schema>
      catalog: <destination-catalog>

    pipeline_mysql:
      name: <pipeline-name>
      catalog: <destination-catalog-1> # Location of the pipeline event log
      schema: <destination-schema-1> # Location of the pipeline event log
      ingestion_definition:
        ingestion_gateway_id: ${resources.pipelines.gateway.id}
        objects:
          - table:
              source_catalog: <source-catalog>
              source_schema: <source-schema>
              source_table: <source-table>
              destination_catalog: <destination-catalog-1> # Location of first copy
              destination_schema: <destination-schema-1> # Location of first copy
          - table:
              source_catalog: <source-catalog>
              source_schema: <source-schema>
              source_table: <source-table>
              destination_catalog: <destination-catalog-2> # Location of second copy
              destination_schema: <destination-schema-2> # Location of second copy
          - table:
              source_catalog: <source-catalog>
              source_schema: <source-schema>
              source_table: <source-table>
              destination_catalog: <destination-catalog-2> # Location of third copy
              destination_schema: <destination-schema-2> # Location of third copy
              destination_table: <custom-destination-table-name> # Specify destination table name

The following are example ingestion gateway and ingestion pipeline specs that you can use in a Python notebook:

Python
gateway_pipeline_spec = {
  "pipeline_type": "INGESTION_GATEWAY",
  "name": <gateway-name>,
  "catalog": <destination-catalog>,
  "target": <destination-schema>,
  "gateway_definition": {
    "connection_id": <connection-id>,
    "gateway_storage_catalog": <destination-catalog>,
    "gateway_storage_schema": <destination-schema>,
    "gateway_storage_name": <destination-schema>
    }
}

ingestion_pipeline_spec = {
  "pipeline_type": "MANAGED_INGESTION",
  "name": <pipeline-name>,
  "ingestion_definition": {
    "ingestion_gateway_id": <gateway-pipeline-id>,
    "source_type": "MYSQL",
    "objects": [
      {
        "table": {
          "source_catalog": <source-catalog>,
          "source_schema": "<source-schema>",
          "source_table": "<source-table>",
          "destination_catalog": "<destination-catalog-1>",
          "destination_schema": "<destination-schema-1>",
        },
        "table": {
          "source_catalog": <source-catalog>,
          "source_schema": "<source-schema>",
          "source_table": "<source-table>",
          "destination_catalog": "<destination-catalog-2>",
          "destination_schema": "<destination-schema-2>",
        },
        "table": {
          "source_catalog": <source-catalog>,
          "source_schema": "<source-schema>",
          "source_table": "<source-table>",
          "destination_catalog": "<destination-catalog-2>",
          "destination_schema": "<destination-schema-2>",
          "destination_table": "<custom-destination-table-name>",
        }
      }
    ]
  }
}

To create the ingestion gateway using the Databricks CLI:

databricks pipelines create --json '{
"name": "'"<gateway-name>"'",
"gateway_definition": {
  "connection_id": "'"<connection-id>"'",
  "gateway_storage_catalog": "'"<staging-catalog>"'",
  "gateway_storage_schema": "'"<staging-schema>"'",
  "gateway_storage_name": "'"<gateway-name>"'"
  }
}'

To create the ingestion pipeline using the Databricks CLI:

databricks pipelines create --json '{
"name": "'"<pipeline-name>"'",
"ingestion_definition": {
  "ingestion_gateway_id": "'"<gateway-id>"'",
  "objects": [
    {"table": {
        "source_catalog": "<source-catalog>",
        "source_schema": "<source-schema>",
        "source_table": "<source-table>",
        "destination_catalog": "'"<destination-catalog-1>"'",
        "destination_schema": "'"<target-schema-1>"'"
        }},
    {"table": {
        "source_catalog": "<source-catalog>",
        "source_schema": "<source-schema>",
        "source_table": "<source-table>",
        "destination_catalog": "'"<destination-catalog-2>"'",
        "destination_schema": "'"<target-schema-2>"'"
        }},
    {"table": {
        "source_catalog": "<source-catalog>",
        "source_schema": "<source-schema>",
        "source_table": "<source-table>",
        "destination_catalog": "'"<destination-catalog-2>"'",
        "destination_schema": "'"<target-schema-2>"'",
        "destination_table": "<custom-destination-table-name>"
        }}
    ]
  }
}'

Salesforce

Databricks Asset Bundles
Databricks notebook
Databricks CLI

The following is an example YAML file that you can use in your bundles:

YAML
resources:
  pipelines:
    pipeline_sfdc:
      name: <pipeline-name>
      catalog: <target-catalog-1> # Location of the pipeline event log
      schema: <target-schema-1>	# Location of the pipeline event log
      ingestion_definition:
        connection_name: <connection>
        objects:
          - table:
              source_schema: <source-schema>
              source_table: <source-table>
              destination_catalog: <target-catalog-1> # Location of first copy
              destination_schema: <target-schema-1>	# Location of first copy
          - table:
              source_schema: <source-schema>
              source_table: <source-table>
              destination_catalog: <target-catalog-2> # Location of second copy
              destination_schema: <target-schema-2> # Location of second copy
	      - table:
              source_schema: <source-schema>
              source_table: <source-table>
              destination_catalog: <target-catalog-2> # Location of third copy
              destination_schema: <target-schema-2> # Location of third copy
              destination_table: <custom-target-table-name> # Specify destination table name

The following is an example Python pipeline spec that you can use in your notebook:

Python
pipeline_spec = """
{
  "name": "<pipeline>",
  "ingestion_definition": {
    "connection_name": "<connection>",
    "objects": [
      {
        "table": {
          "source_schema": "<source-schema>",
          "source_table": "<source-table>",
          "destination_catalog": "<target-catalog-1>",
          "destination_schema": "<target-schema-1>",
        },
        "table": {
          "source_schema": "<source-schema>",
          "source_table": "<source-table>",
          "destination_catalog": "<target-catalog-2>",
          "destination_schema": "<target-schema-2>",
        },
        "table": {
          "source_schema": "<source-schema>",
          "source_table": "<source-table>",
          "destination_catalog": "<target-catalog-2>",
          "destination_schema": "<target-schema-2>",
          "destination_table": "<custom-target-table-name>",
        }
      }
    ]
  }
}
"""

The following is an example JSON pipeline definition that you can use with CLI commands:

JSON
{
  "resources": {
    "pipelines": {
      "pipeline_sfdc": {
        "name": "<pipeline>",
        "catalog": "<target-catalog-1>",
        "schema": "<target-schema-1>",
        "ingestion_definition": {
          "connection_name": "<connection>",
          "objects": [
            {
              "table": {
                "source_schema": "<source-schema>",
                "source_table": "<source-table>",
                "destination_catalog": "<target-catalog-1>",
                "destination_schema": "<target-schema-1>"
              },
              "table": {
                "source_schema": "<source-schema>",
                "source_table": "<source-table>",
                "destination_catalog": "<target-catalog-2>",
                "destination_schema": "<target-schema-2>"
              },
              "table": {
                "source_schema": "<source-schema>",
                "source_table": "<source-table>",
                "destination_catalog": "<target-catalog-2>",
                "destination_schema": "<target-schema-2>",
                "destination_table": "<custom-target-table-name>"
              }
            }
          ]
        }
      }
    }
  }
}

SQL Server

Databricks Asset Bundles
Databricks notebook
Databricks CLI

The following is an example YAML resource file that you can use in your bundle:

YAML
resources:
  pipelines:

    gateway:
      name: <gateway-name>
      gateway_definition:
        connection_id: <connection-id>
        gateway_storage_catalog: <destination-catalog>
        gateway_storage_schema: <destination-schema>
        gateway_storage_name: <destination-schema-name>
      target: <destination-schema>
      catalog: <destination-catalog>

    pipeline_sqlserver:
      name: <pipeline-name>
      catalog: <destination-catalog-1> # Location of the pipeline event log
      schema: <destination-schema-1>	# Location of the pipeline event log
      ingestion_definition:
        ingestion_gateway_id: <gateway-id>
        objects:
          - table:
              source_schema: <source-schema>
              source_table: <source-table>
              destination_catalog: <destination-catalog-1> # Location of first copy
              destination_schema: <destination-schema-1>	# Location of first copy
          - table:
              source_schema: <source-schema>
              source_table: <source-table>
              destination_catalog: <destination-catalog-2> # Location of second copy
              destination_schema: <destination-schema-2> # Location of second copy
        - table:
              source_schema: <source-schema>
              source_table: <source-table>
              destination_catalog: <destination-catalog-2> # Location of third copy
              destination_schema: <destination-schema-2> # Location of third copy
              destination_table: <custom-destination-table-name> # Specify destination table name

The following are example ingestion gateway and ingestion pipeline specs that you can use in a Python notebook:

Python
gateway_pipeline_spec = {
  "pipeline_type": "INGESTION_GATEWAY",
  "name": <gateway-name>,
  "catalog": <destination-catalog>,
  "target": <destination-schema>,
  "gateway_definition": {
    "connection_id": <connection-id>,
    "gateway_storage_catalog": <destination-catalog>,
    "gateway_storage_schema": <destination-schema>,
    "gateway_storage_name": <destination-schema>
    }
}

ingestion_pipeline_spec = {
  "pipeline_type": "MANAGED_INGESTION",
  "name": <pipeline-name>,
  "ingestion_definition": {
    "ingestion_gateway_id": <gateway-pipeline-id>,
    "source_type": "SQLSERVER",
    "objects": [
      {
        "table": {
          "source_catalog": <source-catalog>,
          "source_schema": "<source-schema>",
          "source_table": "<source-table>",
          "destination_catalog": "<destination-catalog-1>",
          "destination_schema": "<destination-schema-1>",
        },
        "table": {
          "source_catalog": <source-catalog>,
          "source_schema": "<source-schema>",
          "source_table": "<source-table>",
          "destination_catalog": "<destination-catalog-2>",
          "destination_schema": "<destination-schema-2>",
        },
        "table": {
          "source_catalog": <source-catalog>,
          "source_schema": "<source-schema>",
          "source_table": "<source-table>",
          "destination_catalog": "<destination-catalog-2>",
          "destination_schema": "<destination-schema-2>",
          "destination_table": "<custom-destination-table-name>",
        }
      }
    ]
  }
}

To create the ingestion gateway using the Databricks CLI:

databricks pipelines create --json '{
"name": "'"<gateway-name>"'",
"gateway_definition": {
  "connection_id": "'"<connection-id>"'",
  "gateway_storage_catalog": "'"<staging-catalog>"'",
  "gateway_storage_schema": "'"<staging-schema>"'",
  "gateway_storage_name": "'"<gateway-name>"'"
  }
}'

To create the ingestion pipeline using the Databricks CLI:

databricks pipelines create --json '{
"name": "'"<pipeline-name>"'",
"ingestion_definition": {
  "ingestion_gateway_id": "'"<gateway-id>"'",
  "objects": [
    {"table": {
        "source_catalog": "<source-catalog>",
        "source_schema": "<source-schema>",
        "source_table": "<source-table>",
        "destination_catalog": "'"<destination-catalog-1>"'",
        "destination_schema": "'"<target-schema-1>"'"
        }},
    {"table": {
        "source_catalog": "<source-catalog>",
        "source_schema": "<source-schema>",
        "source_table": "<source-table>",
        "destination_catalog": "'"<destination-catalog-2>"'",
        "destination_schema": "'"<target-schema-2>"'"
        }},
    {"table": {
        "source_catalog": "<source-catalog>",
        "source_schema": "<source-schema>",
        "source_table": "<source-table>",
        "destination_catalog": "'"<destination-catalog-2>"'",
        "destination_schema": "'"<target-schema-2>"'",
        "destination_table": "<custom-destination-table-name>"
        }}
    ]
  }
}'

Workday

Databricks Asset Bundles
Databricks notebook
Databricks CLI

The following is an example YAML file that you can use in your bundles:

YAML
resources:
  pipelines:
    pipeline_sfdc:
      name: <pipeline-name>
      catalog: <target-catalog-1> # Location of the pipeline event log
      schema: <target-schema-1>	# Location of the pipeline event log
      ingestion_definition:
        connection_name: <connection>
        objects:
          - report:
              source_url: <report-url>
              destination_catalog: <target-catalog-1> # Location of first copy
              destination_schema: <target-schema-1>	# Location of first copy
          - report:
              source_url: <report-url>
              destination_catalog: <target-catalog-2> # Location of second copy
              destination_schema: <target-schema-2> # Location of second copy
	      - report:
              source_url: <report-url>
              destination_catalog: <target-catalog-2> # Location of third copy
              destination_schema: <target-schema-2> # Location of third copy
              destination_table: <custom-target-table-name> # Specify destination table name

The following is an example Python pipeline spec that you can use in your notebook:

Python
pipeline_spec = """
{
  "name": "<pipeline>",
  "ingestion_definition": {
    "connection_name": "<connection>",
    "objects": [
      {
        "report": {
          "source_url": "<report-url>",
          "destination_catalog": "<target-catalog-1>",
          "destination_schema": "<target-schema-1>",
        },
        "report": {
          "source_url": "<report-url>",
          "destination_catalog": "<target-catalog-2>",
          "destination_schema": "<target-schema-2>",
        },
        "report": {
          "source_url": "<report-url>",
          "destination_catalog": "<target-catalog-2>",
          "destination_schema": "<target-schema-2>",
          "destination_table": "<custom-target-table-name>",
        }
      }
    ]
  }
}
"""

The following is an example JSON pipeline definition that you can use with CLI commands:

JSON
{
  "resources": {
    "pipelines": {
      "pipeline_workday": {
        "name": "<pipeline>",
        "catalog": "<target-catalog-1>",
        "schema": "<target-schema-1>",
        "ingestion_definition": {
          "connection_name": "<connection>",
          "objects": [
            {
              "report": {
                "source_url": "<report-url>",
                "destination_catalog": "<target-catalog-1>",
                "destination_schema": "<target-schema-1>"
              },
              "report": {
                "source_url": "<report-url>",
                "destination_catalog": "<target-catalog-2>",
                "destination_schema": "<target-schema-2>"
              },
              "report": {
                "source_url": "<report-url>",
                "destination_catalog": "<target-catalog-2>",
                "destination_schema": "<target-schema-2>",
                "destination_table": "<custom-target-table-name>"
              }
            }
          ]
        }
      }
    }
  }
}

Example: Ingest two objects into different schemas​

Google Analytics​

MySQL​

Salesforce​

SQL Server​

Workday​

Example: Ingest one object three times​

Google Analytics​

MySQL​

Salesforce​

SQL Server​

Workday​

Example: Ingest two objects into different schemas

Google Analytics

MySQL

Salesforce

SQL Server

Workday

Example: Ingest one object three times

Google Analytics

MySQL

Salesforce

SQL Server

Workday