Query-based connector reference
This feature is in Public Preview.
This page contains reference documentation for query-based connectors in Lakeflow Connect, including configuration parameters, cursor column requirements, deletion tracking syntax, and error conditions.
Pipeline configuration
The following parameters are supported for query-based connectors. Set these values in the ingestion_definition block of your pipeline configuration.
Parameter | Type | Required | Description |
|---|---|---|---|
| string | Required for foreign connection ingestion | The name of the connection that stores authentication credentials for the source database. If |
| boolean | Required for foreign catalog ingestion | Set to |
| string | Yes | The catalog in the source system (or the foreign catalog name if |
| string | Yes | The schema in the source system to ingest from. |
| string | Required for table-level objects | The table in the source schema to ingest. |
| string | Required for foreign connection ingestion | The name of the cursor column to use for incremental tracking. Must be a single monotonically increasing column. See Cursor column requirements. |
| list of strings | Required for foreign catalog ingestion | A list containing the cursor column name. Only one column is supported. |
| list of strings | Required unless the history tracking (SCD) mode is | The primary key columns used to identify rows for merge operations. Required for |
| string | No | A SQL expression that evaluates to |
| integer | No | The minimum interval in seconds between primary key snapshot scans for detecting hard deletions. If not set, hard-deletion tracking is disabled. Only configurable using the API (Beta). See Hard-deletion tracking. |
| string | No | The history tracking (SCD) mode for the destination table. Supported values: |
| string | Yes | The catalog to write the destination table to. |
| string | Yes | The schema to write the destination table to. |
| string | No | The name of the destination table. Defaults to the source table name if not specified. See Name a destination table. |
Cursor column requirements
The cursor column tracks incremental progress between pipeline runs. The connector stores the maximum cursor value after each successful run and uses it as the lower bound filter on the next run.
Requirements:
- You must specify a single cursor column. You can't specify multiple columns as a composite cursor. If you specify more than one, the pipeline fails with
INVALID_CURSOR_COLUMNS. - The cursor column must increase monotonically. Values must never decrease. Rows with cursor values at or below the stored high-water mark are not reingested on subsequent runs.
- Rows with a NULL cursor column are not ingested.
Supported column types:
- Timestamp or date types (recommended)
- Numeric types (integer, long integer, decimal, and double)
- Binary types (encoded as binary numbers)
- String types
Deletion condition
The deletion_condition parameter lets query-based connectors detect soft deletions. This feature is only configurable using the API.
The value is a SQL expression that evaluates to true for rows that should be treated as deleted in the destination table. The connector evaluates this expression during each pipeline run.
Syntax:
"deletion_condition": "<sql-expression>"
Examples:
"deletion_condition": "deleted_at IS NOT NULL"
"deletion_condition": "is_deleted = 1"
"deletion_condition": "status = 'DELETED'"
When a row matches the deletion_condition, the connector removes the corresponding row from the destination table (for SCD_TYPE_1) or marks it as deleted (for SCD_TYPE_2).
Hard-deletion tracking
The hard_deletion_sync_min_interval_in_seconds parameter enables query-based connectors to detect hard deletions — rows that have been physically removed from the source table. This feature is in Beta and is only configurable using the API.
Hard-deletion tracking works by periodically taking a snapshot of the primary keys present in the source table and comparing them to the destination table. Rows present in the destination but absent from the source snapshot are treated as deleted.
The value sets the minimum interval in seconds between snapshot scans. This value acts as a lower bound: if the pipeline runs less frequently than this interval, hard-deletion synchronization aligns with the actual ingestion frequency rather than running more often. Updating this parameter does not trigger a full snapshot.
Hard-deletion tracking requires primary_keys to be configured on the table.
Syntax:
"hard_deletion_sync_min_interval_in_seconds": <integer>
Example:
"hard_deletion_sync_min_interval_in_seconds": 86400
This example triggers a hard-deletion scan no more than once every 24 hours (86400 seconds).
Error conditions
The most common error is INVALID_CURSOR_COLUMNS. This occurs when the cursor column is not configured correctly. See Invalid cursor columns for troubleshooting steps.