Create multi-destination pipelines
Applies to: API-based pipeline authoring
SaaS connectors
Database connectors
Using managed ingestion connectors in Lakeflow Connect, you can write to multiple destination catalogs and schemas from one pipeline. You can also ingest multiples of an object into the same schema. However, managed connectors don't support duplicate table names in the same destination schema, so you must specify a new name for one of the tables to differentiate between them. See Name a destination table.
Example: Ingest two objects into different schemas
The example pipeline definitions in this section show how to ingest two objects into different schemas, depending on the pipeline creation interface and the source system.
Google Analytics
- Databricks Asset Bundles
- Databricks notebook
- Databricks CLI
The following is an example YAML file that you can use in your bundles:
resources:
pipelines:
pipeline_ga4:
name: <pipeline>
catalog: <target-catalog-1> # Location of the pipeline event log
schema: <target-schema-1> # Location of the pipeline event log
ingestion_definition:
connection_name: <connection>
objects:
- table:
source_url: <project-1-id>
source_schema: <property-name>
destination_catalog: <target-catalog-1>
destination_schema: <target-schema-1>
- table:
source_url: <project-2-id>
source_schema: <property-name>
destination_catalog: <target-catalog-2>
destination_schema: <target-schema-2>
The following is an example Python pipeline spec that you can use in your notebook:
pipeline_spec = """
{
"name": "<pipeline>",
"ingestion_definition": {
"connection_name": "<connection>",
"objects": [
{
"table": {
"source_catalog": "<project-1-id>",
"source_schema": "<property-1-name>",
"destination_catalog": "<target-catalog-1>",
"destination_schema": "<target-schema-1>",
},
"table": {
"source_catalog": "<project-2-id>",
"source_schema": "<property-2-name>",
"destination_catalog": "<target-catalog-2>",
"destination_schema": "<target-schema-2>",
}
}
]
}
}
"""
The following is an example JSON pipeline definition that you can use with CLI commands:
{
"resources": {
"pipelines": {
"pipeline_ga4": {
"name": "<pipeline>",
"catalog": "<target-catalog-1>",
"schema": "<target-schema-1>",
"ingestion_definition": {
"connection_name": "<connection>",
"objects": [
{
"table": {
"source_url": "<project-1-id>",
"source_schema": "<property-1-name>",
"destination_catalog": "<target-catalog-1>",
"destination_schema": "<target-schema-1>"
},
"table": {
"source_url": "<project-2-id>",
"source_schema": "<property-2-name>",
"destination_catalog": "<target-catalog-2>",
"destination_schema": "<target-schema-2>"
}
}
]
}
}
}
}
}
MySQL
- Databricks Asset Bundles
- Databricks notebook
- Databricks CLI
The following is an example YAML resource file that you can use in your bundle:
resources:
pipelines:
gateway:
name: <gateway-name>
gateway_definition:
connection_id: <connection-id>
gateway_storage_catalog: <destination-catalog>
gateway_storage_schema: <destination-schema>
gateway_storage_name: <destination-schema>
target: <destination-schema>
catalog: <destination-catalog>
pipeline_mysql:
name: <pipeline-name>
catalog: <target-catalog-1> # Location of the pipeline event log
schema: <target-schema-1> # Location of the pipeline event log
ingestion_definition:
ingestion_gateway_id: ${resources.pipelines.gateway.id}
objects:
- table:
source_schema: <source-schema-1>
source_table: <source-table-1>
destination_catalog: <target-catalog-1> # Location of this table
destination_schema: <target-schema-1> # Location of this table
- table:
source_schema: <source-schema-2>
source_table: <source-table-2>
destination_catalog: <target-catalog-2> # Location of this table
destination_schema: <target-schema-2> # Location of this table
The following are example ingestion gateway and ingestion pipeline specs that you can use in a Python notebook:
gateway_pipeline_spec = {
"pipeline_type": "INGESTION_GATEWAY",
"name": <gateway-name>,
"catalog": <destination-catalog>,
"target": <destination-schema>,
"gateway_definition": {
"connection_id": <connection-id>,
"gateway_storage_catalog": <destination-catalog>,
"gateway_storage_schema": <destination-schema>,
"gateway_storage_name": <destination-schema>
}
}
ingestion_pipeline_spec = {
"pipeline_type": "MANAGED_INGESTION",
"name": <pipeline-name>,
"ingestion_definition": {
"ingestion_gateway_id": <gateway-pipeline-id>,
"source_type": "MYSQL",
"objects": [
{
"table": {
"source_schema": "<source-schema-1>",
"source_table": "<source-table-1>",
"destination_catalog": "<destination-catalog-1>",
"destination_schema": "<destination-schema-1>",
},
"table": {
"source_catalog": "<source-catalog-2>",
"source_schema": "<source-schema-2>",
"source_table": "<source-table-2>",
"destination_catalog": "<destination-catalog-2>",
"destination_schema": "<destination-schema-2>",
}
}
]
}
}
To create the ingestion gateway using the Databricks CLI:
databricks pipelines create --json '{
"name": "'"<gateway-name>"'",
"gateway_definition": {
"connection_id": "'"<connection-id>"'",
"gateway_storage_catalog": "'"<staging-catalog>"'",
"gateway_storage_schema": "'"<staging-schema>"'",
"gateway_storage_name": "'"<gateway-name>"'"
}
}'
To create the ingestion pipeline using the Databricks CLI:
databricks pipelines create --json '{
"name": "'"<pipeline-name>"'",
"ingestion_definition": {
"ingestion_gateway_id": "'"<gateway-id>"'",
"objects": [
{"table": {
"source_schema": "<source-schema-1>",
"source_table": "<source-table-1>",
"destination_catalog": "'"<destination-catalog-1>"'",
"destination_schema": "'"<destination-schema-1>"'"
}},
{"table": {
"source_schema": "<source-schema-2>",
"source_table": "<source-table-2>",
"destination_catalog": "'"<destination-catalog-2>"'",
"destination_schema": "'"<destination-schema-2>"'"
}}
]
}
}'
Salesforce
- Databricks Asset Bundles
- Databricks notebook
- Databricks CLI
The following is an example YAML file that you can use in your bundles:
resources:
pipelines:
pipeline_sfdc:
name: <pipeline>
catalog: <target-catalog-1> # Location of the pipeline event log
schema: <target-schema-1> # Location of the pipeline event log
ingestion_definition:
connection_name: <connection>
objects:
- table:
source_schema: <source-schema-1>
source_table: <source-table-1>
destination_catalog: <target-catalog-1> # Location of this table
destination_schema: <target-schema-1> # Location of this table
- table:
source_schema: <source-schema-2>
source_table: <source-table-2>
destination_catalog: <target-catalog-2> # Location of this table
destination_schema: <target-schema-2> # Location of this table
The following is an example Python pipeline spec that you can use in your notebook:
pipeline_spec = """
{
"name": "<pipeline>",
"ingestion_definition": {
"connection_name": "<connection>",
"objects": [
{
"table": {
"source_schema": "<source-schema-1>",
"source_table": "<source-table-1>",
"destination_catalog": "<target-catalog-1>",
"destination_schema": "<target-schema-1>",
},
"table": {
"source_schema": "<source-schema-2>",
"source_table": "<source-table-2>",
"destination_catalog": "<target-catalog-2>",
"destination_schema": "<target-schema-2>",
}
}
]
}
}
"""
The following is an example JSON pipeline definition that you can use with CLI commands:
{
"resources": {
"pipelines": {
"pipeline_sfdc": {
"name": "<pipeline>",
"catalog": "<target-catalog-1>",
"schema": "<target-schema-1>",
"ingestion_definition": {
"connection_name": "<connection>",
"objects": [
{
"table": {
"source_schema": "<source-schema-1>",
"source_table": "<source-table-1>",
"destination_catalog": "<target-catalog-1>",
"destination_schema": "<target-schema-1>"
},
"table": {
"source_schema": "<source-schema-2>",
"source_table": "<source-table-2>",
"destination_catalog": "<target-catalog-2>",
"destination_schema": "<target-schema-2>"
}
}
]
}
}
}
}
}
SQL Server
- Databricks Asset Bundles
- Databricks notebook
- Databricks CLI
The following is an example YAML resource file that you can use in your bundle:
resources:
pipelines:
gateway:
name: <gateway-name>
gateway_definition:
connection_id: <connection-id>
gateway_storage_catalog: <destination-catalog>
gateway_storage_schema: <destination-schema>
gateway_storage_name: <destination-schema>
target: <destination-schema>
catalog: <destination-catalog>
pipeline_sqlserver:
name: <pipeline-name>
catalog: <target-catalog-1> # Location of the pipeline event log
schema: <target-schema-1> # Location of the pipeline event log
ingestion_definition:
connection_name: <connection>
objects:
- table:
source_schema: <source-schema-1>
source_table: <source-table-1>
destination_catalog: <target-catalog-1> # Location of this table
destination_schema: <target-schema-1> # Location of this table
- table:
source_schema: <source-schema-2>
source_table: <source-table-2>
destination_catalog: <target-catalog-2> # Location of this table
destination_schema: <target-schema-2> # Location of this table
The following are example ingestion gateway and ingestion pipeline specs that you can use in a Python notebook:
gateway_pipeline_spec = {
"pipeline_type": "INGESTION_GATEWAY",
"name": <gateway-name>,
"catalog": <destination-catalog>,
"target": <destination-schema>,
"gateway_definition": {
"connection_id": <connection-id>,
"gateway_storage_catalog": <destination-catalog>,
"gateway_storage_schema": <destination-schema>,
"gateway_storage_name": <destination-schema>
}
}
ingestion_pipeline_spec = {
"pipeline_type": "MANAGED_INGESTION",
"name": <pipeline-name>,
"ingestion_definition": {
"ingestion_gateway_id": <gateway-pipeline-id>,
"source_type": "SQLSERVER",
"objects": [
{
"table": {
"source_schema": "<source-schema-1>",
"source_table": "<source-table-1>",
"destination_catalog": "<destination-catalog-1>",
"destination_schema": "<destination-schema-1>",
},
"table": {
"source_schema": "<source-schema-2>",
"source_table": "<source-table-2>",
"destination_catalog": "<destination-catalog-2>",
"destination_schema": "<destination-schema-2>",
}
}
]
}
}
To create the ingestion gateway using the Databricks CLI:
databricks pipelines create --json '{
"name": "'"<gateway-name>"'",
"gateway_definition": {
"connection_id": "'"<connection-id>"'",
"gateway_storage_catalog": "'"<staging-catalog>"'",
"gateway_storage_schema": "'"<staging-schema>"'",
"gateway_storage_name": "'"<gateway-name>"'"
}
}'
To create the ingestion pipeline using the Databricks CLI:
databricks pipelines create --json '{
"name": "'"<pipeline-name>"'",
"ingestion_definition": {
"ingestion_gateway_id": "'"<gateway-id>"'",
"objects": [
{"table": {
"source_catalog": "<source-catalog>",
"source_schema": "<source-schema>",
"source_table": "<source-table>",
"destination_catalog": "'"<destination-catalog-1>"'",
"destination_schema": "'"<destination-schema-1>"'"
}},
{"table": {
"source_catalog": "<source-catalog>",
"source_schema": "<source-schema>",
"source_table": "<source-table>",
"destination_catalog": "'"<destination-catalog-2>"'",
"destination_schema": "'"<destination-schema-2>"'"
}}
]
}
}'
Workday
- Databricks Asset Bundles
- Databricks notebook
- Databricks CLI
The following is an example YAML file that you can use in your bundles:
resources:
pipelines:
pipeline_workday:
name: <pipeline>
catalog: <target-catalog-1> # Location of the pipeline event log
schema: <target-schema-1> # Location of the pipeline event log
ingestion_definition:
connection_name: <connection>
objects:
- report:
source_url: <report-url-1>
destination_catalog: <target-catalog-1>
destination_schema: <target-schema-1>
- report:
source_url: <report-url-2>
destination_catalog: <target-catalog-2>
destination_schema: <target-schema-2>
The following is an example Python pipeline spec that you can use in your notebook:
pipeline_spec = """
{
"name": "<pipeline>",
"ingestion_definition": {
"connection_name": "<connection>",
"objects": [
{
"report": {
"source_url": "<report-url-1>",
"destination_catalog": "<target-catalog-1>",
"destination_schema": "<target-schema-1>",
},
"report": {
"source_url": "<report-url-2>",
"destination_catalog": "<target-catalog-2>",
"destination_schema": "<target-schema-2>",
}
}
]
}
}
"""
The following is an example JSON pipeline definition that you can use with CLI commands:
{
"resources": {
"pipelines": {
"pipeline_workday": {
"name": "<pipeline>",
"catalog": "<target-catalog-1>",
"schema": "<target-schema-1>",
"ingestion_definition": {
"connection_name": "<connection>",
"objects": [
{
"report": {
"source_url": "<report-url-1>",
"destination_catalog": "<target-catalog-1>",
"destination_schema": "<target-schema-1>"
},
"report": {
"source_url": "<report-url-2>",
"destination_catalog": "<target-catalog-2>",
"destination_schema": "<target-schema-2>"
}
}
]
}
}
}
}
}
Example: Ingest one object three times
The following example pipeline definition shows how to ingest an object into three different destination tables. In the example, the third target table is given a unique name to differentiate when an object is ingested into the same destination schema twice (duplicates aren't supported).
Google Analytics
- Databricks Asset Bundles
- Databricks notebook
- Databricks CLI
The following is an example YAML file that you can use in your bundles:
resources:
pipelines:
pipeline_sfdc:
name: <pipeline-name>
catalog: <target-catalog-1> # Location of the pipeline event log
schema: <target-schema-1> # Location of the pipeline event log
ingestion_definition:
connection_name: <connection>
objects:
- table:
source_url: <project-id>
source_schema: <property-name>
destination_catalog: <target-catalog-1> # Location of first copy
destination_schema: <target-schema-1> # Location of first copy
- table:
source_url: <project-id>
source_schema: <property-name>
destination_catalog: <target-catalog-2> # Location of second copy
destination_schema: <target-schema-2> # Location of second copy
- table:
source_url: <project-id>
source_schema: <property-name>
destination_catalog: <target-catalog-2> # Location of third copy
destination_schema: <target-schema-2> # Location of third copy
destination_table: <custom-target-table-name> # Specify destination table name
pipeline_spec = """
{
"name": "<pipeline>",
"ingestion_definition": {
"connection_name": "<connection>",
"objects": [
{
"table": {
"source_catalog": "<project-id>",
"source_schema": "<property-name>",
"destination_catalog": "<target-catalog-1>",
"destination_schema": "<target-schema-1>",
},
"table": {
"source_catalog": "<project-id>",
"source_schema": "<property-name>",
"destination_catalog": "<target-catalog-2>",
"destination_schema": "<target-schema-2>",
},
"table": {
"source_catalog": "<project-id>",
"source_schema": "<property-name>",
"destination_catalog": "<target-catalog-2>",
"destination_schema": "<target-schema-2>",
"destination_table": "<custom-target-table-name>",
},
}
]
}
}
"""
The following is an example JSON pipeline definition that you can use with CLI commands:
{
"resources": {
"pipelines": {
"pipeline_ga4": {
"name": "<pipeline>",
"catalog": "<target-catalog-1>",
"schema": "<target-schema-1>",
"ingestion_definition": {
"connection_name": "<connection>",
"objects": [
{
"table": {
"source_url": "<project-id>",
"source_schema": "<property-name>",
"destination_catalog": "<target-catalog-1>",
"destination_schema": "<target-schema-1>"
},
"table": {
"source_url": "<project-id>",
"source_schema": "<property-name>",
"destination_catalog": "<target-catalog-2>",
"destination_schema": "<target-schema-2>"
},
"table": {
"source_url": "<project-id>",
"source_schema": "<property-name>",
"destination_catalog": "<target-catalog-2>",
"destination_schema": "<target-schema-2>",
"destination_table": "<custom-target-table-name>"
}
}
]
}
}
}
}
}
MySQL
- Databricks Asset Bundles
- Databricks notebook
- Databricks CLI
The following is an example YAML resource file that you can use in your bundle:
resources:
pipelines:
gateway:
name: <gateway-name>
gateway_definition:
connection_id: <connection-id>
gateway_storage_catalog: <destination-catalog>
gateway_storage_schema: <destination-schema>
gateway_storage_name: <destination-schema-name>
target: <destination-schema>
catalog: <destination-catalog>
pipeline_mysql:
name: <pipeline-name>
catalog: <destination-catalog-1> # Location of the pipeline event log
schema: <destination-schema-1> # Location of the pipeline event log
ingestion_definition:
ingestion_gateway_id: ${resources.pipelines.gateway.id}
objects:
- table:
source_catalog: <source-catalog>
source_schema: <source-schema>
source_table: <source-table>
destination_catalog: <destination-catalog-1> # Location of first copy
destination_schema: <destination-schema-1> # Location of first copy
- table:
source_catalog: <source-catalog>
source_schema: <source-schema>
source_table: <source-table>
destination_catalog: <destination-catalog-2> # Location of second copy
destination_schema: <destination-schema-2> # Location of second copy
- table:
source_catalog: <source-catalog>
source_schema: <source-schema>
source_table: <source-table>
destination_catalog: <destination-catalog-2> # Location of third copy
destination_schema: <destination-schema-2> # Location of third copy
destination_table: <custom-destination-table-name> # Specify destination table name
The following are example ingestion gateway and ingestion pipeline specs that you can use in a Python notebook:
gateway_pipeline_spec = {
"pipeline_type": "INGESTION_GATEWAY",
"name": <gateway-name>,
"catalog": <destination-catalog>,
"target": <destination-schema>,
"gateway_definition": {
"connection_id": <connection-id>,
"gateway_storage_catalog": <destination-catalog>,
"gateway_storage_schema": <destination-schema>,
"gateway_storage_name": <destination-schema>
}
}
ingestion_pipeline_spec = {
"pipeline_type": "MANAGED_INGESTION",
"name": <pipeline-name>,
"ingestion_definition": {
"ingestion_gateway_id": <gateway-pipeline-id>,
"source_type": "MYSQL",
"objects": [
{
"table": {
"source_catalog": <source-catalog>,
"source_schema": "<source-schema>",
"source_table": "<source-table>",
"destination_catalog": "<destination-catalog-1>",
"destination_schema": "<destination-schema-1>",
},
"table": {
"source_catalog": <source-catalog>,
"source_schema": "<source-schema>",
"source_table": "<source-table>",
"destination_catalog": "<destination-catalog-2>",
"destination_schema": "<destination-schema-2>",
},
"table": {
"source_catalog": <source-catalog>,
"source_schema": "<source-schema>",
"source_table": "<source-table>",
"destination_catalog": "<destination-catalog-2>",
"destination_schema": "<destination-schema-2>",
"destination_table": "<custom-destination-table-name>",
}
}
]
}
}
To create the ingestion gateway using the Databricks CLI:
databricks pipelines create --json '{
"name": "'"<gateway-name>"'",
"gateway_definition": {
"connection_id": "'"<connection-id>"'",
"gateway_storage_catalog": "'"<staging-catalog>"'",
"gateway_storage_schema": "'"<staging-schema>"'",
"gateway_storage_name": "'"<gateway-name>"'"
}
}'
To create the ingestion pipeline using the Databricks CLI:
databricks pipelines create --json '{
"name": "'"<pipeline-name>"'",
"ingestion_definition": {
"ingestion_gateway_id": "'"<gateway-id>"'",
"objects": [
{"table": {
"source_catalog": "<source-catalog>",
"source_schema": "<source-schema>",
"source_table": "<source-table>",
"destination_catalog": "'"<destination-catalog-1>"'",
"destination_schema": "'"<target-schema-1>"'"
}},
{"table": {
"source_catalog": "<source-catalog>",
"source_schema": "<source-schema>",
"source_table": "<source-table>",
"destination_catalog": "'"<destination-catalog-2>"'",
"destination_schema": "'"<target-schema-2>"'"
}},
{"table": {
"source_catalog": "<source-catalog>",
"source_schema": "<source-schema>",
"source_table": "<source-table>",
"destination_catalog": "'"<destination-catalog-2>"'",
"destination_schema": "'"<target-schema-2>"'",
"destination_table": "<custom-destination-table-name>"
}}
]
}
}'
Salesforce
- Databricks Asset Bundles
- Databricks notebook
- Databricks CLI
The following is an example YAML file that you can use in your bundles:
resources:
pipelines:
pipeline_sfdc:
name: <pipeline-name>
catalog: <target-catalog-1> # Location of the pipeline event log
schema: <target-schema-1> # Location of the pipeline event log
ingestion_definition:
connection_name: <connection>
objects:
- table:
source_schema: <source-schema>
source_table: <source-table>
destination_catalog: <target-catalog-1> # Location of first copy
destination_schema: <target-schema-1> # Location of first copy
- table:
source_schema: <source-schema>
source_table: <source-table>
destination_catalog: <target-catalog-2> # Location of second copy
destination_schema: <target-schema-2> # Location of second copy
- table:
source_schema: <source-schema>
source_table: <source-table>
destination_catalog: <target-catalog-2> # Location of third copy
destination_schema: <target-schema-2> # Location of third copy
destination_table: <custom-target-table-name> # Specify destination table name
The following is an example Python pipeline spec that you can use in your notebook:
pipeline_spec = """
{
"name": "<pipeline>",
"ingestion_definition": {
"connection_name": "<connection>",
"objects": [
{
"table": {
"source_schema": "<source-schema>",
"source_table": "<source-table>",
"destination_catalog": "<target-catalog-1>",
"destination_schema": "<target-schema-1>",
},
"table": {
"source_schema": "<source-schema>",
"source_table": "<source-table>",
"destination_catalog": "<target-catalog-2>",
"destination_schema": "<target-schema-2>",
},
"table": {
"source_schema": "<source-schema>",
"source_table": "<source-table>",
"destination_catalog": "<target-catalog-2>",
"destination_schema": "<target-schema-2>",
"destination_table": "<custom-target-table-name>",
}
}
]
}
}
"""
The following is an example JSON pipeline definition that you can use with CLI commands:
{
"resources": {
"pipelines": {
"pipeline_sfdc": {
"name": "<pipeline>",
"catalog": "<target-catalog-1>",
"schema": "<target-schema-1>",
"ingestion_definition": {
"connection_name": "<connection>",
"objects": [
{
"table": {
"source_schema": "<source-schema>",
"source_table": "<source-table>",
"destination_catalog": "<target-catalog-1>",
"destination_schema": "<target-schema-1>"
},
"table": {
"source_schema": "<source-schema>",
"source_table": "<source-table>",
"destination_catalog": "<target-catalog-2>",
"destination_schema": "<target-schema-2>"
},
"table": {
"source_schema": "<source-schema>",
"source_table": "<source-table>",
"destination_catalog": "<target-catalog-2>",
"destination_schema": "<target-schema-2>",
"destination_table": "<custom-target-table-name>"
}
}
]
}
}
}
}
}
SQL Server
- Databricks Asset Bundles
- Databricks notebook
- Databricks CLI
The following is an example YAML resource file that you can use in your bundle:
resources:
pipelines:
gateway:
name: <gateway-name>
gateway_definition:
connection_id: <connection-id>
gateway_storage_catalog: <destination-catalog>
gateway_storage_schema: <destination-schema>
gateway_storage_name: <destination-schema-name>
target: <destination-schema>
catalog: <destination-catalog>
pipeline_sqlserver:
name: <pipeline-name>
catalog: <destination-catalog-1> # Location of the pipeline event log
schema: <destination-schema-1> # Location of the pipeline event log
ingestion_definition:
ingestion_gateway_id: <gateway-id>
objects:
- table:
source_schema: <source-schema>
source_table: <source-table>
destination_catalog: <destination-catalog-1> # Location of first copy
destination_schema: <destination-schema-1> # Location of first copy
- table:
source_schema: <source-schema>
source_table: <source-table>
destination_catalog: <destination-catalog-2> # Location of second copy
destination_schema: <destination-schema-2> # Location of second copy
- table:
source_schema: <source-schema>
source_table: <source-table>
destination_catalog: <destination-catalog-2> # Location of third copy
destination_schema: <destination-schema-2> # Location of third copy
destination_table: <custom-destination-table-name> # Specify destination table name
The following are example ingestion gateway and ingestion pipeline specs that you can use in a Python notebook:
gateway_pipeline_spec = {
"pipeline_type": "INGESTION_GATEWAY",
"name": <gateway-name>,
"catalog": <destination-catalog>,
"target": <destination-schema>,
"gateway_definition": {
"connection_id": <connection-id>,
"gateway_storage_catalog": <destination-catalog>,
"gateway_storage_schema": <destination-schema>,
"gateway_storage_name": <destination-schema>
}
}
ingestion_pipeline_spec = {
"pipeline_type": "MANAGED_INGESTION",
"name": <pipeline-name>,
"ingestion_definition": {
"ingestion_gateway_id": <gateway-pipeline-id>,
"source_type": "SQLSERVER",
"objects": [
{
"table": {
"source_catalog": <source-catalog>,
"source_schema": "<source-schema>",
"source_table": "<source-table>",
"destination_catalog": "<destination-catalog-1>",
"destination_schema": "<destination-schema-1>",
},
"table": {
"source_catalog": <source-catalog>,
"source_schema": "<source-schema>",
"source_table": "<source-table>",
"destination_catalog": "<destination-catalog-2>",
"destination_schema": "<destination-schema-2>",
},
"table": {
"source_catalog": <source-catalog>,
"source_schema": "<source-schema>",
"source_table": "<source-table>",
"destination_catalog": "<destination-catalog-2>",
"destination_schema": "<destination-schema-2>",
"destination_table": "<custom-destination-table-name>",
}
}
]
}
}
To create the ingestion gateway using the Databricks CLI:
databricks pipelines create --json '{
"name": "'"<gateway-name>"'",
"gateway_definition": {
"connection_id": "'"<connection-id>"'",
"gateway_storage_catalog": "'"<staging-catalog>"'",
"gateway_storage_schema": "'"<staging-schema>"'",
"gateway_storage_name": "'"<gateway-name>"'"
}
}'
To create the ingestion pipeline using the Databricks CLI:
databricks pipelines create --json '{
"name": "'"<pipeline-name>"'",
"ingestion_definition": {
"ingestion_gateway_id": "'"<gateway-id>"'",
"objects": [
{"table": {
"source_catalog": "<source-catalog>",
"source_schema": "<source-schema>",
"source_table": "<source-table>",
"destination_catalog": "'"<destination-catalog-1>"'",
"destination_schema": "'"<target-schema-1>"'"
}},
{"table": {
"source_catalog": "<source-catalog>",
"source_schema": "<source-schema>",
"source_table": "<source-table>",
"destination_catalog": "'"<destination-catalog-2>"'",
"destination_schema": "'"<target-schema-2>"'"
}},
{"table": {
"source_catalog": "<source-catalog>",
"source_schema": "<source-schema>",
"source_table": "<source-table>",
"destination_catalog": "'"<destination-catalog-2>"'",
"destination_schema": "'"<target-schema-2>"'",
"destination_table": "<custom-destination-table-name>"
}}
]
}
}'
Workday
- Databricks Asset Bundles
- Databricks notebook
- Databricks CLI
The following is an example YAML file that you can use in your bundles:
resources:
pipelines:
pipeline_sfdc:
name: <pipeline-name>
catalog: <target-catalog-1> # Location of the pipeline event log
schema: <target-schema-1> # Location of the pipeline event log
ingestion_definition:
connection_name: <connection>
objects:
- report:
source_url: <report-url>
destination_catalog: <target-catalog-1> # Location of first copy
destination_schema: <target-schema-1> # Location of first copy
- report:
source_url: <report-url>
destination_catalog: <target-catalog-2> # Location of second copy
destination_schema: <target-schema-2> # Location of second copy
- report:
source_url: <report-url>
destination_catalog: <target-catalog-2> # Location of third copy
destination_schema: <target-schema-2> # Location of third copy
destination_table: <custom-target-table-name> # Specify destination table name
The following is an example Python pipeline spec that you can use in your notebook:
pipeline_spec = """
{
"name": "<pipeline>",
"ingestion_definition": {
"connection_name": "<connection>",
"objects": [
{
"report": {
"source_url": "<report-url>",
"destination_catalog": "<target-catalog-1>",
"destination_schema": "<target-schema-1>",
},
"report": {
"source_url": "<report-url>",
"destination_catalog": "<target-catalog-2>",
"destination_schema": "<target-schema-2>",
},
"report": {
"source_url": "<report-url>",
"destination_catalog": "<target-catalog-2>",
"destination_schema": "<target-schema-2>",
"destination_table": "<custom-target-table-name>",
}
}
]
}
}
"""
The following is an example JSON pipeline definition that you can use with CLI commands:
{
"resources": {
"pipelines": {
"pipeline_workday": {
"name": "<pipeline>",
"catalog": "<target-catalog-1>",
"schema": "<target-schema-1>",
"ingestion_definition": {
"connection_name": "<connection>",
"objects": [
{
"report": {
"source_url": "<report-url>",
"destination_catalog": "<target-catalog-1>",
"destination_schema": "<target-schema-1>"
},
"report": {
"source_url": "<report-url>",
"destination_catalog": "<target-catalog-2>",
"destination_schema": "<target-schema-2>"
},
"report": {
"source_url": "<report-url>",
"destination_catalog": "<target-catalog-2>",
"destination_schema": "<target-schema-2>",
"destination_table": "<custom-target-table-name>"
}
}
]
}
}
}
}
}