Skip to main content

Vector Search filtering guide

This page describes the filter expression syntax for Vector Search queries. The filter syntax is different for standard and storage-optimized endpoints.

  • Standard endpoints use a Python dictionary, where the key encodes both the column name and the operator (for example, {"price <": 200}).
  • Storage-optimized endpoints use a SQL-like filter string similar to a WHERE clause (for example, "price < 200").

For a complete reference of supported operators and where to find filters in the query API, see Use filters on queries. For example notebooks, see Example notebooks.

Standard endpoints

Standard endpoints accept a Python dictionary passed to similarity_search() through the filters parameter. The key encodes both the column name and the operator (for example, {"price >": 40000}), and multiple keys in the same dict are combined with AND logic. This section summarizes the supported operators and known limitations.

Quick reference

String columns

Operator

Syntax

Example

Exact match

{"col": "val"}

{"make": "Toyota"}

Negation

{"col NOT": "val"}

{"make NOT": "Ford"}

OR (multiple values)

{"col": ["v1","v2"]}

{"make": ["Toyota","Honda"]}

Token-based LIKE

{"col LIKE": "token"}

{"color LIKE": "red"}

Hyphenated values

{"col": "F-150"}

{"model": "F-150"}

JSON pattern LIKE

[{"col LIKE": "%pattern%"}, {"col2 op": n}]

[{"specs LIKE": '%"drivetrain":"AWD"%'}, {"price <": 50000}]

Numeric columns (INT, DOUBLE)

Operator

Syntax

Example

Greater than

{"col >": n}

{"price >": 40000}

Less than or equal

{"col <=": n}

{"price <=": 55000}

Greater than or equal

{"col >=": n}

{"rating >=": 4.5}

Integer match

{"col": n}

{"year": 2024}

Range

Two keys in dict

{"price >=": 30000, "price <=": 55000}

Boolean columns

Operator

Syntax

Example

Match true

{"col": true}

{"in_stock": true}

Match false

{"col": false}

{"in_stock": false}

Array columns

Standard endpoints support primitive ARRAY types: ARRAY<STRING>, ARRAY<INT>, ARRAY<BIGINT>, ARRAY<SMALLINT>, ARRAY<TINYINT>, ARRAY<FLOAT>, ARRAY<DOUBLE>, ARRAY<BOOLEAN>, ARRAY<DATE>, and ARRAY<TIMESTAMP>. ARRAY<STRUCT> is not supported.

Operator

Syntax

Example

Contains a value

{"col": "val"}

{"body_type": "sedan"}

Contains any value (OR)

{"col": ["v1","v2"]}

{"body_type": ["hybrid","electric"]}

AND with another column

{"col": ["v1","v2"], "col2": "val"}

{"body_type": ["hybrid", "electric"], "make": "BMW"}

Timestamp columns (native TIMESTAMP or DATE)

Operator

Syntax

Example

After

{"col >": "ISO8601Z"}

{"listed_at >": "2024-01-01T00:00:00Z"}

Exact timestamp match

{"col": "ISO8601Z"}

{"listed_at": "2024-01-10T09:15:00Z"}

Negation

{"col NOT": "ISO8601Z"}

{"listed_at NOT": "2024-01-10T09:15:00Z"}

Range

Two keys in dict

{"listed_at >=": "2024-01-01T00:00:00Z", "listed_at <": "2024-04-01T00:00:00Z"}

Combined filters

Multiple keys in a single dict are combined with AND logic. Use the {"col1 OR col2 op": [v1, v2]} syntax for OR across different fields.

Pattern

Syntax

Example

String + numeric

Multiple keys

{"make": "BMW", "price >": 60000}

Numeric + numeric

Multiple keys

{"price >": 50000, "rating >=": 4.7}

String + numeric + array

Multiple keys

{"make": ["Tesla", "Toyota"], "price <=": 55000, "body_type": "electric"}

String + array + numeric

Multiple keys

{"body_type": ["hybrid", "electric"], "price <": 70000, "year": 2024}

String + numeric + timestamp

Multiple keys

{"make": "Toyota", "price <=": 40000, "listed_at >=": "2024-01-01T00:00:00Z"}

OR across fields

Combined key

{"make OR price <=": ["Tesla", 30000]}

JSON LIKE + numeric

List of dicts

[{"specs LIKE": '%"drivetrain":"AWD"%'}, {"price <": 50000}]

AND on the same field

List of dicts

[{"make_model_year LIKE": "%Tesla%"}, {"make_model_year LIKE": "%2024%"}]

Limitations

Limitation

Detail

Workaround

ARRAY<struct> not supported

Creating an index with ARRAY<struct> columns raises BadRequest: Invalid column type in schema. Supported field types include array<float>, array<tinyint>, array<double>, timestamp, tinyint, float, smallint, array<date>, string, array<boolean>, array<smallint>, double, boolean, date, int, array<string>, array<bigint>, array<timestamp>, bigint, and array<int>.

Unflatten the struct into a string column for indexing and filtering.

LIKE is token-based only

Matches whole whitespace-separated tokens, not SQL wildcard patterns (%, _).

Use a storage-optimized endpoint for wildcard LIKE.

STRING date columns not supported

Using >, <, >=, or <= on STRING columns raises BadRequest: Please use a numeric value.

Use a native TIMESTAMP column with ISO 8601 values, or store dates as epoch milliseconds.

BETWEEN not supported

There is no BETWEEN operator.

Use two keys in the dict, for example {"price >=": 30000, "price <=": 55000}.

No SQL functions in filters

Functions like to_timestamp() are not supported in dict filters.

Use a storage-optimized endpoint with a SQL filter string. For timestamps, store as epoch milliseconds.

JSON numeric filtering not supported

Standard endpoints cannot extract or cast nested JSON values (for example, specs.hp >, CAST(), or get_json_object()).

Use LIKE patterns on the JSON string, or pre-extract the value as a top-level INT column.

Duplicate dict keys silently dropped

Python keeps only the last value for duplicate keys in a dict. For example, {"make_model_year LIKE": "%Tesla%", "make_model_year LIKE": "%2024%"} only applies the second filter.

Use a list of dicts: [{"make_model_year LIKE": "%Tesla%"}, {"make_model_year LIKE": "%2024%"}].

Storage-optimized endpoints

Storage-optimized endpoints accept a SQL-like filter string passed to similarity_search() through the filters parameter. This section summarizes the supported operators and known limitations.

Quick reference

String columns

Operator

Syntax

Example

Exact match

col = 'val'

make = 'Toyota'

Negation

col != 'val'

make != 'Ford'

OR (multiple values)

col IN ('v1','v2')

make IN ('Toyota','Honda')

Wildcard pattern

col LIKE 'pat%'

color LIKE 'bl%'

Hyphenated values

col = 'val-ue'

model = 'F-150'

JSON substring match

col LIKE '%"k":v%'

specs LIKE '%"hp":4%'

Numeric columns (INT, DOUBLE)

Operator

Syntax

Example

Greater than

col > n

price > 40000

Less than or equal

col <= n

price <= 25000

Greater than or equal

col >= n

rating >= 4.7

Integer match

col = n

year = 2024

Range

col >= a AND col <= b

price >= 30000 AND price <= 55000

Boolean columns

Operator

Syntax

Example

Boolean true

col IS TRUE

in_stock IS TRUE

Array columns

Array filtering is not supported on storage-optimized endpoints. ARRAY_CONTAINS raises a BadRequest: Syntax error. As a workaround, concatenate array values into a string column and use LIKE. For example:

  • "body_type LIKE '%sedan%'"
  • "body_type LIKE '%hybrid%' OR body_type LIKE '%electric%'"

Timestamp columns (native TIMESTAMP)

Operator

Syntax

Example

After date

col > TO_TIMESTAMP('ISO8601')

listed_at > TO_TIMESTAMP('2024-03-01T00:00:00')

Range

Combine with AND

listed_at >= TO_TIMESTAMP('2024-01-01T00:00:00') AND listed_at < TO_TIMESTAMP('2024-04-01T00:00:00')

Combined filters

Pattern

Syntax

Example

String + numeric

A AND B

make = 'BMW' AND price > 60000

Numeric + numeric

A AND B

price > 50000 AND rating >= 4.7

String + numeric + IN

A AND B AND C

make IN ('Tesla', 'Toyota') AND price <= 55000 AND year >= 2022

OR across fields

A OR (B AND C)

make = 'Tesla' OR (make = 'BMW' AND price > 60000)

Timestamp + numeric + string

A AND B AND C

listed_at >= TO_TIMESTAMP('2024-01-01T00:00:00') AND price < 40000 AND make = 'Toyota'

Limitations

Limitation

Detail

Workaround

ARRAY<struct> not supported

Creating an index with ARRAY<struct> columns raises BadRequest: Invalid column type in schema. Supported field types include array<float>, array<tinyint>, array<double>, timestamp, tinyint, float, smallint, array<date>, string, array<boolean>, array<smallint>, double, boolean, date, int, array<string>, array<bigint>, array<timestamp>, bigint, and array<int>.

Unflatten the struct into a string column for indexing and filtering.

ARRAY_CONTAINS not supported

ARRAY_CONTAINS(col, 'val') raises BadRequest: Syntax error. Array filtering is not supported in storage-optimized filter strings.

Concatenate array values into a string column and use LIKE.

BETWEEN not supported

Using BETWEEN raises BadRequest: no viable alternative at input 'priceBETWEEN'.

Use price >= a AND price <= b.

Bare timestamp strings cause type mismatch

listed_at > '2024-03-01T00:00:00' raises BadRequest: Cannot compare timestamp[us] with STRING_LITERAL.

Wrap the value with TO_TIMESTAMP(), for example listed_at > TO_TIMESTAMP('2024-03-01T00:00:00'). The column must be a native TIMESTAMP type.

JSON numeric filtering not supported

SQL functions like CAST(get_json_object(specs, '$.hp') AS INT) > 300 raise BadRequest: Syntax error. Expression-based filters are not supported in storage-optimized filter strings.

Pre-extract JSON fields into top-level columns at index creation time, or use LIKE pattern matching (for example, specs LIKE '%"hp":4%' OR specs LIKE '%"hp":5%' to approximate hp values of 400-599).

Post-filtering (overfetch)

Results are ranked by relevance first, then filtered. Most cases are handled automatically, but in rare scenarios—mainly LIKE '%...%' filters on more than 40 indexes with low num_results—matching documents with low relevance scores might not appear even if they satisfy the filter.

Increase num_results to widen the candidate pool.

Example notebooks

Standard endpoint setup and filtering notebook

Open notebook in new tab

Storage-optimized endpoint setup and filtering notebook

Open notebook in new tab