Skip to main content

Select rows to ingest

Beta

This feature is in Beta.

Applies to: check marked yes API-based pipeline authoring check marked yes SaaS connectors

Row filtering allows you to ingest only the data you need by applying conditions similar to a SQL WHERE clause. This improves performance (especially for initial loads with historical data) and minimizes data duplication (especially in development environments).

Supported connectors

  • Google Analytics
  • Salesforce
  • ServiceNow

How row filtering works

Row filtering acts like a WHERE filter in SQL. You can compare values in the source against integers, booleans, strings, and other data types. You can also use complex combinations of clauses to pull only the data you need.

Row filtering applies during both the initial load and subsequent incremental updates.

Limitations

Row filtering has the following limitations:

  • Salesforce: Row filtering is only supported on two columns: the primary key (ID, if available) and the cursor column. The connector selects the cursor column from the following list, in order of preference: SystemModstamp, LastModifiedDate, CreatedDate, and LoginTime.

  • ServiceNow:

    • Only the AND operator is supported. The OR operator is not currently available. For example, u_age = 40 AND u_active = TRUE works, but u_age = 40 OR u_active = TRUE does not.
    • Timestamps in the filters must be in the following format: YYYY-MM-DD HH: mm:SS (for example, 2004-03-02 17:14:59).
  • Timestamp column filters: When you filter on timestamp data type columns, the following limitations apply. These limitations do not apply when you filter on other column types.

    • Row filtering only works for incrementally updated tables, not batch updated tables.
    • Row filtering applies on write only, not on read.
  • Row or query updates: The connector does not delete a row when it matches the filter on the initial load but either the row or the query is updated such that it no longer matches on a subsequent load. The connector also does not ingest a row that did not match the query in a previous pipeline update but now matches in a subsequent update.

Configure row filtering

To configure a pipeline with row filtering, add the row_filter config to your pipeline specification. For example:

Python
pipeline_spec = """
{
"name": "...",
"ingestion_definition": {
"connection_name": "...",
"objects": [
{
"table": {
"source_schema": "...",
"source_table": "...",
"destination_catalog": "...",
"destination_schema": "...",
"destination_table": "...",
"table_configuration": {
"row_filter": "..."
}
}
}
]
},
"channel": "PREVIEW"
}
"""
create_pipeline(pipeline_spec)

Examples

Ingest data after a certain system timestamp:

JSON
"row_filter": "SystemModstamp > '2025-06-10T23:40:11.000-07:00'"

Ingest a specific row:

JSON
"row_filter": "Id = 'a00Qy00000vps2NIAQ'"

Supported operators

The following table shows which operators are supported for row filtering:

Operator

Supported

AND

Yes

OR

Salesforce and Google Analytics only

=

Yes

!=

Yes

LIKE

No

IN

No

< <=

Yes

> >=

Yes

Row filtering behavior in edge cases

The following table describes row filtering behavior in edge case scenarios.

Scenario

Behavior

Refresh required

A row fails to match the filter on the initial load, then is updated to match it on a subsequent load.

The row is ingested during the next pipeline update.

No

A row matches the filter on the initial load, then is updated to no longer match it.

The row is not deleted during the next pipeline update.

No

The query is updated, and a previously uningested row now matches.

The row is not ingested during the next pipeline update.

Yes (full refresh required)

The query is updated, and a previously ingested row no longer matches.

The row is not deleted during the next pipeline update.

No

Additional resources