Configure full refresh behavior for database connectors
Applies to: Database connectors
API-based pipeline authoring
Learn how to configure full refresh behavior for managed ingestion pipelines with database connectors (like SQL Server) in Lakeflow Connect. You can schedule when full refresh snapshots occur and enable automatic full refresh to recover from unsupported schema changes.
Full refresh window
A full refresh window lets you schedule when snapshot operations for full refresh occur. When you request a full refresh or when the system automatically triggers a full refresh, the snapshot starts during the next available time in the configured window. The following table shows how the scheduling works:
Request time | Window | Snapshot starts | Notes |
|---|---|---|---|
Monday 2025-10-20 10:00:00 UTC | Start hour: 20, Days: Tuesday, Time zone: UTC | Tuesday 2025-10-21 20:00:00 UTC | Snapshot deferred to next available window day |
Monday 2025-10-20 09:30:00 UTC | Start hour: 9, Days: Monday, Time zone: UTC | Monday 2025-10-20 09:30:00 UTC | Same day, request time within window |
Monday 2025-10-20 10:00:00 UTC | Start hour: 9, Days: Monday, Time zone: UTC | Monday 2025-10-27 09:00:00 UTC | Request time past window, deferred to next week |
Configuration parameters
Configure the full refresh window in the ingestion_definition of your pipeline specification:
Parameter | Type | Description | Required |
|---|---|---|---|
| Integer | The start hour for the window (0-23) in the 24-hour day. | Yes |
| Array | Days when the window is active. Valid values: | No |
| String | Time zone ID for the window. See Set the session time zone for supported time zone IDs. Defaults to UTC if not specified. | No |
Example: Configure a full refresh window
The following examples show how to add a full refresh window to your pipeline definition.
- Databricks Asset Bundles
- Databricks notebook
resources:
pipelines:
gateway:
name: <gateway-name>
gateway_definition:
connection_id: <connection-id>
gateway_storage_catalog: <destination-catalog>
gateway_storage_schema: <destination-schema>
gateway_storage_name: <destination-schema>
target: <destination-schema>
catalog: <destination-catalog>
pipeline_sqlserver:
name: <pipeline-name>
catalog: <destination-catalog>
schema: <destination-schema>
ingestion_definition:
ingestion_gateway_id: <gateway-id>
objects:
- table:
source_schema: <source-schema>
source_table: <source-table>
destination_catalog: <destination-catalog>
destination_schema: <destination-schema>
full_refresh_window:
start_hour: 20
days_of_week:
- MONDAY
- TUESDAY
time_zone_id: 'America/Los_Angeles'
gateway_pipeline_spec = {
"pipeline_type": "INGESTION_GATEWAY",
"name": "<gateway-name>",
"catalog": "<destination-catalog>",
"target": "<destination-schema>",
"gateway_definition": {
"connection_id": "<connection-id>",
"gateway_storage_catalog": "<destination-catalog>",
"gateway_storage_schema": "<destination-schema>",
"gateway_storage_name": "<destination-schema>"
}
}
ingestion_pipeline_spec = {
"pipeline_type": "MANAGED_INGESTION",
"name": "<pipeline-name>",
"catalog": "<destination-catalog>",
"schema": "<destination-schema>",
"ingestion_definition": {
"ingestion_gateway_id": "<gateway-pipeline-id>",
"source_type": "SQLSERVER",
"objects": [
{
"table": {
"source_schema": "<source-schema>",
"source_table": "<source-table>",
"destination_catalog": "<destination-catalog>",
"destination_schema": "<destination-schema>"
}
}
],
"full_refresh_window": {
"start_hour": 20,
"days_of_week": ["MONDAY", "TUESDAY"],
"time_zone_id": "America/Los_Angeles"
}
}
}
Auto full refresh policy
To help maintain data consistency without manual intervention, an auto full refresh policy lets you automatically trigger a full refresh when the pipeline encounters unsupported DDL operations:
- Table truncate
- Incompatible schema changes (for example, data type changes)
- Column renames
- Column additions with default values
Without auto full refresh enabled, you must manually trigger a full refresh when these operations occur.
Configuration parameters
Configure auto full refresh at the pipeline level or table level in your pipeline specification:
Parameter | Type | Description | Default |
|---|---|---|---|
| Boolean | Whether auto full refresh is enabled. |
|
| Integer | Minimum wait interval in hours between full refreshes. The system waits for this interval since the last snapshot before initiating a new auto full refresh. | 24 |
You can configure auto full refresh at multiple levels:
- Pipeline level: In
ingestion_definition.table_configuration.auto_full_refresh_policy - Table level: In
ingestion_definition.objects[].table.table_configuration.auto_full_refresh_policy
Table-level configuration overrides pipeline-level configuration.
Example: Configure auto full refresh at the pipeline level
The following examples show how to enable auto full refresh for all tables in a pipeline.
- Databricks Asset Bundles
- Databricks notebook
resources:
pipelines:
gateway:
name: <gateway-name>
gateway_definition:
connection_id: <connection-id>
gateway_storage_catalog: <destination-catalog>
gateway_storage_schema: <destination-schema>
gateway_storage_name: <destination-schema>
target: <destination-schema>
catalog: <destination-catalog>
pipeline_sqlserver:
name: <pipeline-name>
catalog: <destination-catalog>
schema: <destination-schema>
ingestion_definition:
ingestion_gateway_id: <gateway-id>
objects:
- table:
source_schema: <source-schema>
source_table: <source-table>
destination_catalog: <destination-catalog>
destination_schema: <destination-schema>
table_configuration:
auto_full_refresh_policy:
enabled: true
min_interval_hours: 24
gateway_pipeline_spec = {
"pipeline_type": "INGESTION_GATEWAY",
"name": "<gateway-name>",
"catalog": "<destination-catalog>",
"target": "<destination-schema>",
"gateway_definition": {
"connection_id": "<connection-id>",
"gateway_storage_catalog": "<destination-catalog>",
"gateway_storage_schema": "<destination-schema>",
"gateway_storage_name": "<destination-schema>"
}
}
ingestion_pipeline_spec = {
"pipeline_type": "MANAGED_INGESTION",
"name": "<pipeline-name>",
"catalog": "<destination-catalog>",
"schema": "<destination-schema>",
"ingestion_definition": {
"ingestion_gateway_id": "<gateway-pipeline-id>",
"source_type": "SQLSERVER",
"objects": [
{
"table": {
"source_schema": "<source-schema>",
"source_table": "<source-table>",
"destination_catalog": "<destination-catalog>",
"destination_schema": "<destination-schema>"
}
}
],
"table_configuration": {
"auto_full_refresh_policy": {
"enabled": True,
"min_interval_hours": 24
}
}
}
}
Example: Configure auto full refresh per table
The following examples show how to enable auto full refresh at the pipeline level but disable it for specific tables.
- Databricks Asset Bundles
- Databricks notebook
resources:
pipelines:
gateway:
name: <gateway-name>
gateway_definition:
connection_id: <connection-id>
gateway_storage_catalog: <destination-catalog>
gateway_storage_schema: <destination-schema>
gateway_storage_name: <destination-schema>
target: <destination-schema>
catalog: <destination-catalog>
pipeline_sqlserver:
name: <pipeline-name>
catalog: <destination-catalog>
schema: <destination-schema>
ingestion_definition:
ingestion_gateway_id: <gateway-id>
objects:
- table:
source_schema: <source-schema>
source_table: table_1
destination_catalog: <destination-catalog>
destination_schema: <destination-schema>
- table:
source_schema: <source-schema>
source_table: table_2
destination_catalog: <destination-catalog>
destination_schema: <destination-schema>
table_configuration:
auto_full_refresh_policy:
enabled: false
min_interval_hours: 24
table_configuration:
auto_full_refresh_policy:
enabled: true
min_interval_hours: 24
gateway_pipeline_spec = {
"pipeline_type": "INGESTION_GATEWAY",
"name": "<gateway-name>",
"catalog": "<destination-catalog>",
"target": "<destination-schema>",
"gateway_definition": {
"connection_id": "<connection-id>",
"gateway_storage_catalog": "<destination-catalog>",
"gateway_storage_schema": "<destination-schema>",
"gateway_storage_name": "<destination-schema>"
}
}
ingestion_pipeline_spec = {
"pipeline_type": "MANAGED_INGESTION",
"name": "<pipeline-name>",
"catalog": "<destination-catalog>",
"schema": "<destination-schema>",
"ingestion_definition": {
"ingestion_gateway_id": "<gateway-pipeline-id>",
"source_type": "SQLSERVER",
"objects": [
{
"table": {
"source_schema": "<source-schema>",
"source_table": "table_1",
"destination_catalog": "<destination-catalog>",
"destination_schema": "<destination-schema>"
}
},
{
"table": {
"source_schema": "<source-schema>",
"source_table": "table_2",
"destination_catalog": "<destination-catalog>",
"destination_schema": "<destination-schema>",
"table_configuration": {
"auto_full_refresh_policy": {
"enabled": False,
"min_interval_hours": 24
}
}
}
}
],
"table_configuration": {
"auto_full_refresh_policy": {
"enabled": True,
"min_interval_hours": 24
}
}
}
}
In this example, table_1 uses the pipeline-level policy (enabled), whereas table_2 overrides it with table-level configuration (disabled).