Skip to main content

Query-based connector reference

Preview

This feature is in Public Preview.

This page contains reference documentation for query-based connectors in Lakeflow Connect, including configuration parameters, cursor column requirements, deletion tracking syntax, and error conditions.

Pipeline configuration

The following parameters are supported for query-based connectors. Set these values in the ingestion_definition block of your pipeline configuration.

Parameter

Type

Required

Description

connection_name

string

Required for foreign connection ingestion

The name of the connection that stores authentication credentials for the source database. If ingest_from_uc_foreign_catalog is true, do not specify connection_name.

ingest_from_uc_foreign_catalog

boolean

Required for foreign catalog ingestion

Set to true to ingest from a foreign catalog backed by Lakehouse Federation. If true, do not specify connection_name.

source_catalog

string

Yes

The catalog in the source system (or the foreign catalog name if ingest_from_uc_foreign_catalog is true).

source_schema

string

Yes

The schema in the source system to ingest from.

source_table

string

Required for table-level objects

The table in the source schema to ingest.

cursor_column

string

Required for foreign connection ingestion

The name of the cursor column to use for incremental tracking. Must be a single monotonically increasing column. See Cursor column requirements.

cursor_columns

list of strings

Required for foreign catalog ingestion

A list containing the cursor column name. Only one column is supported.

primary_keys

list of strings

Required unless the history tracking (SCD) mode is APPEND_ONLY (foreign catalog ingestion)

The primary key columns used to identify rows for merge operations. Required for SCD_TYPE_1 and SCD_TYPE_2 modes.

deletion_condition

string

No

A SQL expression that evaluates to true for rows that represent soft deletions. Only configurable using the API. See Deletion condition.

hard_deletion_sync_min_interval_in_seconds

integer

No

The minimum interval in seconds between primary key snapshot scans for detecting hard deletions. If not set, hard-deletion tracking is disabled. Only configurable using the API (Beta). See Hard-deletion tracking.

scd_type

string

No

The history tracking (SCD) mode for the destination table. Supported values: SCD_TYPE_1 (default), SCD_TYPE_2, APPEND_ONLY.

destination_catalog

string

Yes

The catalog to write the destination table to.

destination_schema

string

Yes

The schema to write the destination table to.

destination_table

string

No

The name of the destination table. Defaults to the source table name if not specified. See Name a destination table.

Cursor column requirements

The cursor column tracks incremental progress between pipeline runs. The connector stores the maximum cursor value after each successful run and uses it as the lower bound filter on the next run.

Requirements:

  • You must specify a single cursor column. You can't specify multiple columns as a composite cursor. If you specify more than one, the pipeline fails with INVALID_CURSOR_COLUMNS.
  • The cursor column must increase monotonically. Values must never decrease. Rows with cursor values at or below the stored high-water mark are not reingested on subsequent runs.
  • Rows with a NULL cursor column are not ingested.

Supported column types:

  • Timestamp or date types (recommended)
  • Numeric types (integer, long integer, decimal, and double)
  • Binary types (encoded as binary numbers)
  • String types

Deletion condition

The deletion_condition parameter lets query-based connectors detect soft deletions. This feature is only configurable using the API.

The value is a SQL expression that evaluates to true for rows that should be treated as deleted in the destination table. The connector evaluates this expression during each pipeline run.

Syntax:

"deletion_condition": "<sql-expression>"

Examples:

JSON
"deletion_condition": "deleted_at IS NOT NULL"
JSON
"deletion_condition": "is_deleted = 1"
JSON
"deletion_condition": "status = 'DELETED'"

When a row matches the deletion_condition, the connector removes the corresponding row from the destination table (for SCD_TYPE_1) or marks it as deleted (for SCD_TYPE_2).

Hard-deletion tracking

The hard_deletion_sync_min_interval_in_seconds parameter enables query-based connectors to detect hard deletions — rows that have been physically removed from the source table. This feature is in Beta and is only configurable using the API.

Hard-deletion tracking works by periodically taking a snapshot of the primary keys present in the source table and comparing them to the destination table. Rows present in the destination but absent from the source snapshot are treated as deleted.

The value sets the minimum interval in seconds between snapshot scans. This value acts as a lower bound: if the pipeline runs less frequently than this interval, hard-deletion synchronization aligns with the actual ingestion frequency rather than running more often. Updating this parameter does not trigger a full snapshot.

note

Hard-deletion tracking requires primary_keys to be configured on the table.

Syntax:

"hard_deletion_sync_min_interval_in_seconds": <integer>

Example:

JSON
"hard_deletion_sync_min_interval_in_seconds": 86400

This example triggers a hard-deletion scan no more than once every 24 hours (86400 seconds).

Error conditions

The most common error is INVALID_CURSOR_COLUMNS. This occurs when the cursor column is not configured correctly. See Invalid cursor columns for troubleshooting steps.