SQL Server ingestion pipeline maintenance

Preview

LakeFlow Connect is in gated Public Preview. To participate in the preview, contact your Databricks account team.

This article describes ongoing operations for maintaining SQL Server ingestion pipelines using the LakeFlow Connect UI.

Specify tables to ingest

The Pipelines API provides two methods to specify tables to ingest in the objects field of the ingestion_definition:

  • Table specification: Ingests an individual table from the specified source catalog and schema to the specified destination catalog and schema.

    {"table": {
    
            "source_catalog": ...,
    
            "source_schema": ...,
    
            "source_table": ...,
    
            "destination_catalog": ...,
    
            "destination_schema": ...
    
            }}
    
  • Schema specification: Ingests all tables from the specified source catalog and schema into the specified catalog and schema.

    {"schema": {
    
            "source_catalog": ...,
    
            "source_schema": ...,
    
            "destination_catalog": ...,
    
            "destination_schema": ...
    
            }}
    

When specifying the source_catalog, source_schema, and source_table names in the preceding example, the exact letter case needs to be used. For example, if the source database catalog is specified as Marketing in sys.databases, the same casing should be used instead of, for example, marketing. Otherwise, no tables will be discovered for ingestion.

When using schema specifications, all new tables in the source schema are automatically ingested to the destination. Ignoring dropped tables is not supported. If a table gets dropped in the source database, it needs to be explicitly removed from the ingestion pipeline definition. In the case of full schema replication, it needs to be replaced with an explicit set of tables.

To decrease the load on the source database, the gateway only checks periodically for new tables. It might take up to 6 hours for new tables to be discovered. If you want to speed up this process, restart the gateway.

Multiple table or schema specifications can be included in the objects field of the ingestion_definition. None of the source table names in different source schemas can overlap. This results in ingestion pipeline failure.

Source DDL capture and schema evolution

The SQL Server connector can track changes to the table schema of the source tables. Some table schema changes are handled automatically and the schema of the corresponding target table evolves automatically.

The following table schema changes are handled automatically:

ALTER TABLE ADD COLUMN with no default value.

All other table schema changes are considered incompatible schema changes.

ALTER TABLE … ADD COLUMN

The ingestion pipeline automatically restarts to include the new column(s). Rows in the target Delta table prior to the schema change have the value of the new column(s) set to NULL.

Incompatible schema changes

An incompatible schema change causes the ingestion pipeline to fail with an INCOMPATIBLE_SCHEMA_CHANGE error. To continue replication, trigger a full refresh of the affected tables.

Note

Databricks can not guarantee that at time the ingestion pipeline fails for an incompatible schema change, all rows prior to the schema change have been ingested.

Change the ingestion pipeline schedule

  1. In the Databricks workspace, click Job Runs.

  2. Click the Jobs tab.

  3. Click the job, and then modify the Schedules & Triggers section.

Full refresh of target tables

You can use the full refresh capability in Delta Live Tables with your ingestion pipelines. A full refresh synchronizes the schema of one or more target tables with the schema of the corresponding source tables and re-copies all the table data. This is useful in the case of an incompatible schema change.

To trigger a full refresh on selected tables, click Select tables for refresh in the DLT pipeline UI for the ingestion pipeline. Select the desired tables and click Full refresh selection.

To perform a full refresh on all tables in an ingestion pipeline, click the drop-down menu next to the Start button and select Full refresh all.

A DLT pipeline update with a full refresh of one or more tables can be recognized by the phase after the initialization phase during the pipeline start-up.

Important

The ingestion pipeline update might fail during the Initializing or Resetting tables phase. DLT will retry the pipeline automatically several times. If the automatic retries are manually interrupted or eventually fail fatally, start the new pipeline update manually with the table refresh selection from before. Failing to do so can result in the target tables being left in an inconsistent state with partial data. If manual retries also fail, create a support ticket.

Alerts and notifications

Alerts for pipeline failures are automatically set up for all gateways, managed ingestion pipelines, and managed ingestion scheduling jobs. You can customize alerts using the Pipeline API. See Notifications in the PUT /api/2.0/pipelines/{pipeline_id} documentation.