vector_search function

Applies to: check marked yes Databricks SQL

Preview

This feature is in Public Preview.

The vector_search() function allows you to query a Mosaic AI Vector Search index using SQL.

Requirements

Syntax

vector_search(index, query, num_results)

Arguments

All arguments must be passed by name, like vector_search(index => indexName, query => queryText).

  • index: A STRING constant, the fully qualified name of an existing vector search index in the same workspace for invocations. The definer must have “Select” permission on the index.

  • query: An STRING expression, the string to search for in the index.

  • num_results (optional): An integer constant, the max number of records to return. Defaults to 10.

Returns

A table of the top matching records from the index. All the columns of the index are included.

Examples

Search over an index of product SKUs to find similar products by name.

SELECT * FROM VECTOR_SEARCH(index => "main.db.my_index", query => "iphone", num_results => 2)

ID

Product name

10

iPhone

20

iPhone SE

The following example searches for multiple terms at the same time by using a LATERAL subquery.

SELECT
  query_txt,
  query_id,
  search.*
FROM
  query_table,
  LATERAL(
SELECT * FROM VECTOR_SEARCH(index => "main.db.my_index", query => query_txt, num_results => 2)
  ) as search

query_txt

query_id

search.id

search.product_name

iphone

1

10

iPhone 10

iphone

1

20

iPhone SE

pixel 8

2

30

Pixel 8

pixel 8

2

40

Pixel 8a

Limitations

The following limitations apply during the preview:

  • Querying DIRECT_ACCESS index types are not supported.

  • Indexes with embedding_vector_columns are not supported.

  • Input parameters filters_json or columns are not supported.

  • Vector Search with num_results greater than 100 are not supported.

  • Users who do not have READ access to the source table cannot use vector_search().

  • Hybrid keyword-similarity search is not supported using vector_search().

  • vector_search cannot be used with model serving endpoints using Foundation Model APIs provisioned throughput.