Query-based connector reference

This page contains reference documentation for query-based connectors in Lakeflow Connect, including configuration parameters, cursor column requirements, deletion tracking syntax, and error conditions.

Pipeline configuration

The following parameters are supported for query-based connectors. Set these values in the ingestion_definition block of your pipeline configuration.

Parameter	Type	Required	Description
`connection_name`	string	Required for foreign connection ingestion	The name of the connection that stores authentication credentials for the source database. If `ingest_from_uc_foreign_catalog` is `true`, do not specify `connection_name`.
`ingest_from_uc_foreign_catalog`	boolean	Required for foreign catalog ingestion	Set to `true` to ingest from a foreign catalog backed by Lakehouse Federation. If `true`, do not specify `connection_name`.
`source_catalog`	string	Yes	The catalog in the source system (or the foreign catalog name if `ingest_from_uc_foreign_catalog` is `true`).
`source_schema`	string	Yes	The schema in the source system to ingest from.
`source_table`	string	Required for table-level objects	The table in the source schema to ingest.
`cursor_column`	string	Required for foreign connection ingestion	The name of the cursor column to use for incremental tracking. Must be a single monotonically increasing column. See Cursor column requirements.
`cursor_columns`	list of strings	Required for foreign catalog ingestion	A list containing the cursor column name. Only one column is supported.
`primary_keys`	list of strings	Required unless the history tracking (SCD) mode is `APPEND_ONLY` (foreign catalog ingestion)	The primary key columns used to identify rows for merge operations. Required for `SCD_TYPE_1` and `SCD_TYPE_2` modes.
`deletion_condition`	string	No	A SQL expression that evaluates to `true` for rows that represent soft deletions. Only configurable using the API. See Deletion condition.
`hard_deletion_sync_min_interval_in_seconds`	integer	No	The minimum interval in seconds between primary key snapshot scans for detecting hard deletions. If not set, hard-deletion tracking is disabled. This parameter is not supported when `scd_type` is `SCD_TYPE_2`. It is only configurable using the API (Beta). See Hard-deletion tracking.
`scd_type`	string	No	The history tracking (SCD) mode for the destination table. Supported values: `SCD_TYPE_1` (default), `SCD_TYPE_2`, `APPEND_ONLY`.
`destination_catalog`	string	Yes	The catalog to write the destination table to.
`destination_schema`	string	Yes	The schema to write the destination table to.
`destination_table`	string	No	The name of the destination table. Defaults to the source table name if not specified. See Name a destination table.

Cursor column requirements

The cursor column tracks incremental progress between pipeline runs. The connector stores the maximum cursor value after each successful run and uses it as the lower bound filter on the next run.

Requirements:

You must specify a single cursor column. You can't specify multiple columns as a composite cursor. If you specify more than one, the pipeline fails with INVALID_CURSOR_COLUMNS.
The cursor column must increase monotonically. Values must never decrease. Rows with cursor values at or below the stored high-water mark are not reingested on subsequent runs.
Rows with a NULL cursor column are not ingested.

Supported column types:

Timestamp or date types (recommended)
Numeric types (integer, long integer, decimal, and double)
Binary types (encoded as binary numbers)
String types

Deletion condition

The deletion_condition parameter lets query-based connectors detect soft deletions. This feature is only configurable using the API.

The value is a SQL expression that evaluates to true for rows that should be treated as deleted in the destination table. The connector evaluates this expression during each pipeline run.

Syntax:

"deletion_condition": "<sql-expression>"

Examples:

JSON
"deletion_condition": "deleted_at IS NOT NULL"

JSON
"deletion_condition": "is_deleted = 1"

JSON
"deletion_condition": "status = 'DELETED'"

When a row matches the deletion_condition, the connector removes the corresponding row from the destination table (for SCD_TYPE_1) or marks it as deleted (for SCD_TYPE_2).

Hard-deletion tracking

The hard_deletion_sync_min_interval_in_seconds parameter enables query-based connectors to detect hard deletions — rows that have been physically removed from the source table. This feature is in Beta and is only configurable using the API.

Hard-deletion tracking works by periodically taking a snapshot of the primary keys present in the source table and comparing them to the destination table. Rows present in the destination but absent from the source snapshot are treated as deleted.

The value sets the minimum interval in seconds between snapshot scans. This value acts as a lower bound: if the pipeline runs less frequently than this interval, hard-deletion synchronization aligns with the actual ingestion frequency rather than running more often. Updating this parameter does not trigger a full snapshot.

note

Hard-deletion tracking requires primary_keys to be configured on the table, and is supported only for timestamp cursor columns. It is not supported when scd_type is SCD_TYPE_2.

Syntax:

"hard_deletion_sync_min_interval_in_seconds": <integer>

Example:

JSON
"hard_deletion_sync_min_interval_in_seconds": 86400

This example triggers a hard-deletion scan no more than once every 24 hours (86400 seconds).

Error conditions

The most common error is INVALID_CURSOR_COLUMNS. This occurs when the cursor column is not configured correctly. See Invalid cursor columns for troubleshooting steps.

Pipeline configuration​

Cursor column requirements​

Deletion condition​

Hard-deletion tracking​

Error conditions​

Pipeline configuration

Cursor column requirements

Deletion condition

Hard-deletion tracking

Error conditions