Enable history tracking (SCD type 2)
The history tracking setting, also known as the slowly changing dimensions (SCD) setting, determines how to handle changes in your data over time. Turn history tracking off (SCD type 1) to overwrite outdated records as they're updated and deleted in the source. Turn history tracking on (SCD type 2) to maintain a history of those changes. Deleting a table or column in the source does not delete that data from the destination, even when SCD type 1 is selected.
For example, let's say that you ingest the following table:
Let's also say that Alice's favorite color changes to purple on January 2.
If history tracking is off (SCD type 1), the next run of the ingestion pipeline updates that row in the destination table.
If history tracking is on (SCD type 2), the ingestion pipeline keeps the old row and adds the update as a new row. It marks the old row as inactive so that you know which row is up-to-date.
Not all connectors support SCD type 2. For a list of supported connectors, see Feature compatibility.
Example: Google Analytics
- Databricks Asset Bundles
- Databricks notebook
- Databricks CLI
By default, history tracking is off (SCD type 1). You can use the following example YAML file to change this setting in a bundle:
resources:
pipelines:
pipeline_ga4:
name: <pipeline>
catalog: <target-catalog>
schema: <target-schema>
ingestion_definition:
connection_name: <connection>
objects:
- table:
source_url: <project-id>
source_schema: <property-name>
destination_catalog: <destination-catalog>
destination_schema: <destination-schema>
table_configuration:
scd_type: SCD_TYPE_2
By default, history tracking is off (SCD type 1). You can use the following example Python code in a notebook to change this setting:
pipeline_spec = """
{
"name": "<pipeline>",
"ingestion_definition": {
"connection_name": "<connection>",
"objects": [
{
"table": {
"source_catalog": "<project-id>",
"source_schema": "<property-name>",
"source_table": "<source-table>",
"destination_catalog": "<target-catalog>",
"destination_schema": "<target-schema>",
"table_configuration": {
"scd_type": "SCD_TYPE_2",
}
}
}
]
}
}
"""
By default, history tracking is off (SCD type 1). You can use the following example JSON spec in a CLI command to change this setting:
{
"resources": {
"pipelines": {
"pipeline_ga4": {
"name": "<pipeline>",
"catalog": "<target-catalog>",
"schema": "<target-schema>",
"ingestion_definition": {
"connection_name": "<connection>",
"objects": [
{
"table": {
"source_url": "<project-id>",
"source_schema": "<property-name>",
"destination_catalog": "<destination-catalog>",
"destination_schema": "<destination-schema>",
"table_configuration": {
"scd_type": "SCD_TYPE_2"
}
}
}
]
}
}
}
}
}
Example: Workday
- Databricks Asset Bundles
- Databricks notebook
- Databricks CLI
By default, history tracking is off (SCD type 1). The following example YAML changes this setting using Databricks Asset Bundles.
resources:
pipelines:
pipeline_workday:
name: <pipeline>
catalog: <target-catalog>
schema: <target-schema>
ingestion_definition:
connection_name: <connection>
objects:
- report:
source_url: <report-url>
destination_catalog: <destination-catalog>
destination_schema: <destination-schema>
table_configuration:
scd_type: SCD_TYPE_2
pipeline_spec = """
{
"name": "<pipeline>",
"ingestion_definition": {
"connection_name": "<connection>",
"objects": [
{
"report": {
"source_url": "<report-url>",
"destination_catalog": "<target-catalog>",
"destination_schema": "<target-schema>",
"table_configuration": {
"scd_type": "SCD_TYPE_2",
}
}
}
]
}
}
"""
{
"resources": {
"pipelines": {
"pipeline_workday": {
"name": "<pipeline>",
"catalog": "<target-catalog>",
"schema": "<target-schema>",
"ingestion_definition": {
"connection_name": "<connection>",
"objects": [
{
"report": {
"source_url": "<report-url>",
"destination_catalog": "<destination-catalog>",
"destination_schema": "<destination-schema>",
"table_configuration": {
"scd_type": "SCD_TYPE_2"
}
}
}
]
}
}
}
}
}