Fully refresh target tables

Applies to: SaaS connectors Database connectors

Fully refreshing the ingestion pipeline clears the data and state of the target tables, then reprocesses all records from the data source. You can fully refresh all tables in the pipeline or select tables to refresh.

Interface	Instructions
Lakehouse UI	Manually trigger a pipeline update
Pipelines API	POST /api/2.0/pipelines/`{pipeline_id}`/updates
Databricks CLI	databricks pipelines start-update

important

The ingestion pipeline update might fail during the Initializing or Resetting tables phase. Lakeflow Connect retries the pipeline automatically several times. If you interrupt the automatic retries or they eventually fail fatally, start a new pipeline update manually using the same table refresh selection as before. Otherwise, the target tables can end up in an inconsistent state with partial data. If manual retries also fail, create a support ticket.

Full refresh behavior (CDC)

Applies to: Database connectors

When you trigger a full refresh of a table, Databricks optimizes the process to reduce downtime and maintain data availability:

Snapshot request: When you request a full refresh, the ingestion gateway immediately begins creating a new snapshot of the source table. The destination streaming table is excluded from refresh selection until the snapshot completes.
Continued availability: During the snapshot process, the destination streaming table retains its existing data and remains available for queries. No updates, appends, or deletes are applied to the table while the snapshot is in progress.
Atomic refresh: After the snapshot completes, Databricks automatically performs the full refresh in a single update. This update applies all snapshot data and any CDC records accumulated since the snapshot was requested.

For example, if your table has 50 records at the end of update 15, and you request a full refresh in update 16:

The ingestion gateway begins creating a snapshot during update 16.
The table continues to show the original 50 records until the snapshot completes.
When the snapshot completes (in update 16 or later, depending on the source table size), the full refresh is automatically applied in one atomic operation.

This approach significantly reduces downtime during full refresh operations and helps prevent PENDING_RESET and timeout errors.

Configure full refresh behavior for database connectors

Applies to: Database connectors API-based pipeline authoring

Learn how to configure full refresh behavior for managed ingestion pipelines with database connectors (like SQL Server) in Lakeflow Connect. You can schedule when full refresh snapshots occur and enable automatic full refresh to recover from unsupported schema changes.

Full refresh window

A full refresh window lets you schedule when snapshot operations for full refresh occur. When you request a full refresh or when the system automatically triggers a full refresh, the snapshot starts during the next available time in the configured window. The following table shows how the scheduling works:

Request time	Window	Snapshot starts	Notes
Monday 2025-10-20 10:00:00 UTC	Start hour: 20, Days: Tuesday, Time zone: UTC	Tuesday 2025-10-21 20:00:00 UTC	Snapshot deferred to next available window day
Monday 2025-10-20 09:30:00 UTC	Start hour: 9, Days: Monday, Time zone: UTC	Monday 2025-10-20 09:30:00 UTC	Same day, request time within window
Monday 2025-10-20 10:00:00 UTC	Start hour: 9, Days: Monday, Time zone: UTC	Monday 2025-10-27 09:00:00 UTC	Request time past window, deferred to next week

Configuration parameters

Configure the full refresh window in the ingestion_definition of your pipeline specification:

Parameter	Type	Description	Required
`start_hour`	Integer	The start hour for the window (0-23) in the 24-hour day.	Yes
`days_of_week`	Array	Days when the window is active. Valid values: `MONDAY`, `TUESDAY`, `WEDNESDAY`, `THURSDAY`, `FRIDAY`, `SATURDAY`, `SUNDAY`. If not specified, all days are used.	No
`time_zone_id`	String	Time zone ID for the window. See Set the session time zone for supported time zone IDs. Defaults to UTC if not specified.	No

Example: Configure a full refresh window

The following examples show how to add a full refresh window to your pipeline definition.

Databricks Asset Bundles
Databricks notebook

YAML
resources:
  pipelines:
    gateway:
      name: <gateway-name>
      gateway_definition:
        connection_id: <connection-id>
        gateway_storage_catalog: <destination-catalog>
        gateway_storage_schema: <destination-schema>
        gateway_storage_name: <destination-schema>
      target: <destination-schema>
      catalog: <destination-catalog>

    pipeline_sqlserver:
      name: <pipeline-name>
      catalog: <destination-catalog>
      schema: <destination-schema>
      ingestion_definition:
        ingestion_gateway_id: <gateway-id>
        objects:
          - table:
              source_schema: <source-schema>
              source_table: <source-table>
              destination_catalog: <destination-catalog>
              destination_schema: <destination-schema>
        full_refresh_window:
          start_hour: 20
          days_of_week:
            - MONDAY
            - TUESDAY
          time_zone_id: 'America/Los_Angeles'

Python
gateway_pipeline_spec = {
  "pipeline_type": "INGESTION_GATEWAY",
  "name": "<gateway-name>",
  "catalog": "<destination-catalog>",
  "target": "<destination-schema>",
  "gateway_definition": {
    "connection_id": "<connection-id>",
    "gateway_storage_catalog": "<destination-catalog>",
    "gateway_storage_schema": "<destination-schema>",
    "gateway_storage_name": "<destination-schema>"
  }
}

ingestion_pipeline_spec = {
  "pipeline_type": "MANAGED_INGESTION",
  "name": "<pipeline-name>",
  "catalog": "<destination-catalog>",
  "schema": "<destination-schema>",
  "ingestion_definition": {
    "ingestion_gateway_id": "<gateway-pipeline-id>",
    "source_type": "SQLSERVER",
    "objects": [
      {
        "table": {
          "source_schema": "<source-schema>",
          "source_table": "<source-table>",
          "destination_catalog": "<destination-catalog>",
          "destination_schema": "<destination-schema>"
        }
      }
    ],
    "full_refresh_window": {
      "start_hour": 20,
      "days_of_week": ["MONDAY", "TUESDAY"],
      "time_zone_id": "America/Los_Angeles"
    }
  }
}

Auto full refresh policy

To help maintain data consistency without manual intervention, an auto full refresh policy lets you automatically trigger a full refresh when the pipeline encounters unsupported DDL operations:

Table truncate
Incompatible schema changes (for example, data type changes)
Column renames
Column additions with default values

Without auto full refresh enabled, you must manually trigger a full refresh when these operations occur.

Configuration parameters

Configure auto full refresh at the pipeline level or table level in your pipeline specification:

Parameter	Type	Description	Default
`enabled`	Boolean	Whether auto full refresh is enabled.	`false`
`min_interval_hours`	Integer	Minimum wait interval in hours between full refreshes. The system waits for this interval since the last snapshot before initiating a new auto full refresh.	24

You can configure auto full refresh at multiple levels:

Pipeline level: In ingestion_definition.table_configuration.auto_full_refresh_policy
Table level: In ingestion_definition.objects[].table.table_configuration.auto_full_refresh_policy

Table-level configuration overrides pipeline-level configuration.

Example: Configure auto full refresh at the pipeline level

The following examples show how to enable auto full refresh for all tables in a pipeline.

Databricks Asset Bundles
Databricks notebook

YAML
resources:
  pipelines:
    gateway:
      name: <gateway-name>
      gateway_definition:
        connection_id: <connection-id>
        gateway_storage_catalog: <destination-catalog>
        gateway_storage_schema: <destination-schema>
        gateway_storage_name: <destination-schema>
      target: <destination-schema>
      catalog: <destination-catalog>

    pipeline_sqlserver:
      name: <pipeline-name>
      catalog: <destination-catalog>
      schema: <destination-schema>
      ingestion_definition:
        ingestion_gateway_id: <gateway-id>
        objects:
          - table:
              source_schema: <source-schema>
              source_table: <source-table>
              destination_catalog: <destination-catalog>
              destination_schema: <destination-schema>
        table_configuration:
          auto_full_refresh_policy:
            enabled: true
            min_interval_hours: 24

Python
gateway_pipeline_spec = {
  "pipeline_type": "INGESTION_GATEWAY",
  "name": "<gateway-name>",
  "catalog": "<destination-catalog>",
  "target": "<destination-schema>",
  "gateway_definition": {
    "connection_id": "<connection-id>",
    "gateway_storage_catalog": "<destination-catalog>",
    "gateway_storage_schema": "<destination-schema>",
    "gateway_storage_name": "<destination-schema>"
  }
}

ingestion_pipeline_spec = {
  "pipeline_type": "MANAGED_INGESTION",
  "name": "<pipeline-name>",
  "catalog": "<destination-catalog>",
  "schema": "<destination-schema>",
  "ingestion_definition": {
    "ingestion_gateway_id": "<gateway-pipeline-id>",
    "source_type": "SQLSERVER",
    "objects": [
      {
        "table": {
          "source_schema": "<source-schema>",
          "source_table": "<source-table>",
          "destination_catalog": "<destination-catalog>",
          "destination_schema": "<destination-schema>"
        }
      }
    ],
    "table_configuration": {
      "auto_full_refresh_policy": {
        "enabled": True,
        "min_interval_hours": 24
      }
    }
  }
}

Example: Configure auto full refresh per table

The following examples show how to enable auto full refresh at the pipeline level but disable it for specific tables.

Databricks Asset Bundles
Databricks notebook

YAML
resources:
  pipelines:
    gateway:
      name: <gateway-name>
      gateway_definition:
        connection_id: <connection-id>
        gateway_storage_catalog: <destination-catalog>
        gateway_storage_schema: <destination-schema>
        gateway_storage_name: <destination-schema>
      target: <destination-schema>
      catalog: <destination-catalog>

    pipeline_sqlserver:
      name: <pipeline-name>
      catalog: <destination-catalog>
      schema: <destination-schema>
      ingestion_definition:
        ingestion_gateway_id: <gateway-id>
        objects:
          - table:
              source_schema: <source-schema>
              source_table: table_1
              destination_catalog: <destination-catalog>
              destination_schema: <destination-schema>
          - table:
              source_schema: <source-schema>
              source_table: table_2
              destination_catalog: <destination-catalog>
              destination_schema: <destination-schema>
              table_configuration:
                auto_full_refresh_policy:
                  enabled: false
                  min_interval_hours: 24
        table_configuration:
          auto_full_refresh_policy:
            enabled: true
            min_interval_hours: 24

Python
gateway_pipeline_spec = {
  "pipeline_type": "INGESTION_GATEWAY",
  "name": "<gateway-name>",
  "catalog": "<destination-catalog>",
  "target": "<destination-schema>",
  "gateway_definition": {
    "connection_id": "<connection-id>",
    "gateway_storage_catalog": "<destination-catalog>",
    "gateway_storage_schema": "<destination-schema>",
    "gateway_storage_name": "<destination-schema>"
  }
}

ingestion_pipeline_spec = {
  "pipeline_type": "MANAGED_INGESTION",
  "name": "<pipeline-name>",
  "catalog": "<destination-catalog>",
  "schema": "<destination-schema>",
  "ingestion_definition": {
    "ingestion_gateway_id": "<gateway-pipeline-id>",
    "source_type": "SQLSERVER",
    "objects": [
      {
        "table": {
          "source_schema": "<source-schema>",
          "source_table": "table_1",
          "destination_catalog": "<destination-catalog>",
          "destination_schema": "<destination-schema>"
        }
      },
      {
        "table": {
          "source_schema": "<source-schema>",
          "source_table": "table_2",
          "destination_catalog": "<destination-catalog>",
          "destination_schema": "<destination-schema>",
          "table_configuration": {
            "auto_full_refresh_policy": {
              "enabled": False,
              "min_interval_hours": 24
            }
          }
        }
      }
    ],
    "table_configuration": {
      "auto_full_refresh_policy": {
        "enabled": True,
        "min_interval_hours": 24
      }
    }
  }
}

In this example, table_1 uses the pipeline-level policy (enabled), whereas table_2 overrides it with table-level configuration (disabled).

Full refresh behavior (CDC)​

Configure full refresh behavior for database connectors​

Full refresh window​

Configuration parameters​

Example: Configure a full refresh window​

Auto full refresh policy​

Configuration parameters​

Example: Configure auto full refresh at the pipeline level​

Example: Configure auto full refresh per table​

Full refresh behavior (CDC)

Configure full refresh behavior for database connectors

Full refresh window

Configuration parameters

Example: Configure a full refresh window

Auto full refresh policy

Configuration parameters

Example: Configure auto full refresh at the pipeline level

Example: Configure auto full refresh per table