Select rows to ingest

Beta

This feature is in Beta. Workspace admins can control access to this feature from the Previews page. See Manage Databricks previews.

Applies to: API-based pipeline authoring SaaS connectors

Row filtering allows you to ingest only the data you need by applying conditions similar to a SQL WHERE clause. This improves performance (especially for initial loads with historical data) and minimizes data duplication (especially in development environments).

Supported connectors

Google Analytics
Salesforce
ServiceNow

How row filtering works

Row filtering acts like a WHERE filter in SQL. You can compare values in the source against integers, booleans, strings, and other data types. You can also use complex combinations of clauses to pull only the data you need.

Row filtering applies during both the initial load and subsequent incremental updates.

Limitations

Row filtering has the following limitations:

Salesforce: Row filtering is only supported on two columns: the primary key (ID, if available) and the cursor column. The connector selects the cursor column from the following list, in order of preference: SystemModstamp, LastModifiedDate, CreatedDate, and LoginTime.
ServiceNow:
- Only the AND operator is supported. The OR operator is not currently available. For example, u_age = 40 AND u_active = TRUE works, but u_age = 40 OR u_active = TRUE does not.
- Timestamps in the filters must be in the following format: YYYY-MM-DD HH: mm:SS (for example, 2004-03-02 17:14:59).
Timestamp column filters: When you filter on timestamp data type columns, the following limitations apply. These limitations do not apply when you filter on other column types.
- Row filtering only works for incrementally updated tables, not batch updated tables.
- Row filtering applies on write only, not on read.
Row or query updates: The connector does not delete a row when it matches the filter on the initial load but either the row or the query is updated such that it no longer matches on a subsequent load. The connector also does not ingest a row that did not match the query in a previous pipeline update but now matches in a subsequent update.

Configure row filtering

To configure a pipeline with row filtering, add the row_filter config to your pipeline specification. For example:

Python
pipeline_spec = """
{
 "name": "...",
 "ingestion_definition": {
     "connection_name": "...",
     "objects": [
        {
          "table": {
            "source_schema": "...",
            "source_table": "...",
            "destination_catalog": "...",
            "destination_schema": "...",
            "destination_table": "...",
            "table_configuration": {
              "row_filter": "..."
            }
          }
        }
      ]
 },
 "channel": "PREVIEW"
}
"""
create_pipeline(pipeline_spec)

Examples

Salesforce
Google Analytics
ServiceNow

Ingest data after a certain system timestamp:

JSON
"row_filter": "SystemModstamp > '2025-06-10T23:40:11.000-07:00'"

Ingest a specific row:

JSON
"row_filter": "Id = 'a00Qy00000vps2NIAQ'"

Ingest data after a certain event timestamp:

JSON
"row_filter": "event_timestamp > '1712224270703246'"

Ingest data for active users:

JSON
"row_filter": "is_active_user = TRUE"

Ingest data for non-web platforms:

JSON
"row_filter": "platform != 'WEB'"

Ingest data with multiple conditions:

JSON
"row_filter": "event_timestamp > '1712224270703246' AND (platform != 'WEB' OR is_active_user = FALSE)"

Ingest data after a certain event timestamp:

JSON
"row_filter": "sys_updated_on > '2004-03-02 17:14:59'"

Ingest data for active users:

JSON
"row_filter": "u_active = TRUE"

Ingest data for specific users:

JSON
"row_filter": "u_name = 'johnsmith'"

Ingest data with multiple conditions:

JSON
"row_filter": "u_active = TRUE AND u_name = 'johnsmith'"

Supported operators

The following table shows which operators are supported for row filtering:

Operator	Supported
`AND`	Yes
`OR`	Salesforce and Google Analytics only
`=`	Yes
`!=`	Yes
`LIKE`	No
`IN`	No
`<` `<=`	Yes
`>` `>=`	Yes

FAQ

Find answers to frequently asked questions about row filtering.

What happens if a row fails to match the row filter on the initial load but is later updated to match it on a subsequent load?

The row is ingested during the next pipeline update. This does not require a refresh.

What happens if a row matches the row filter on the initial load but is later updated to no longer match it?

The row is not deleted during the next pipeline update.

What happens if I update the query and a previously uningested row now matches?

The row is not ingested during the next pipeline update. This requires a full refresh.

What happens if I update the query and a previously ingested row no longer matches?

The row is not deleted during the next pipeline update.

Supported connectors​

How row filtering works​

Limitations​

Configure row filtering​

Examples​

Supported operators​

FAQ​

What happens if a row fails to match the row filter on the initial load but is later updated to match it on a subsequent load?​

What happens if a row matches the row filter on the initial load but is later updated to no longer match it?​

What happens if I update the query and a previously uningested row now matches?​

What happens if I update the query and a previously ingested row no longer matches?​

Additional resources​