`vector_search` function

Applies to: Databricks SQL

Preview

The vector_search() function allows you to query a Mosaic AI Vector Search index using SQL.

Requirements

This function is not available on classic SQL warehouses.
For more information, see Databricks SQL pricing page.
This function is available in regions where Mosaic AI Vector Search is supported.

Syntax

In Databricks Runtime 15.3 and above, use query_text or query_vector to specify what to search for in the index.

SQL
SELECT * FROM vector_search(
  index,
  { query_text | query_vector },
  [ num_results ]
)

In Databricks Runtime 15.2 and below, use query to specify what to search for in the index.

SQL
SELECT * FROM vector_search(
  index, query, num_results
)

Arguments

All arguments must be passed by name, like vector_search(index => indexName, query_text => queryText).

index: A STRING constant, the fully qualified name of an existing vector search index in the same workspace for invocations. The definer must have Select permission on the index.
Use one of the following to specify the expression to search for in the index:
- For Databricks Runtime 15.3 or above, use query_text to search for a specific string of text in the embedding source column of your Delta Sync Index. The query must be a STRING expression of the string to search for in the index.
- For Databricks Runtime 15.3 or above, use query_vector to search for a specific vector in the embedding vector column of your Delta Sync Index. This argument is Required for searching a Delta Sync Index using self-managed vectors. The query must be an ARRAY<FLOAT>, or ARRAY<DOUBLE>, or ARRAY<DECIMAL(_, _)> expression of the embedding vector to search for in the index.
- For Databricks Runtime 15.2 or below, use query to specify the string to search for in your index.
num_results (optional): An integer constant, the max number of records to return. Defaults to 10.
query_type(optional): The type of search you want to perform on your vector search index. Defaults to ANN if not explicitly specified.
- If the query_type is ANN , then either query_text or query_vector must be specified to perform an approximate nearest neighbor search or similarity search.
- If the query_type is HYBRID, then query_text must be specified. You can specify both query_text and query_vector for a hybrid search. Hybrid search in this instance means a combination of similarity search and keyword search where a keyword search uses the literal text as the target.

The following table summarizes which arguments can be used when you have a Delta Sync index with an embedding model:

`query_type`	`query_text`	`query_vector`	Description
ANN (also known as approximate nearest neighbor or similarity search)	✓		For a similarity search for a specific string of text or a vector, you can specify `query_text`. The embedding can be automatically computed from `query_text`, and used for the similarity search, so no need to specify `query_vector`.
ANN (also known as approximate nearest neighbor or similarity search)		✓	Use `query_vector` for a similarity search.
Hybrid	✓		Use `query_text` for a keyword search. Keyword search uses the literal text as target, so `query_text` is required. The embedding can be automatically computed from `query_text`, and used for the similarity search, so no need to specify `query_vector`.
Hybrid	✓	✓	Use `query_text` for keyword search and use `query_vector` for similarity search.

The following table describes the different scenarios and the arguments that can be used when you have a Delta sync index without an embedding model:

`query_type`	`query_text`	`query_vector`	Description
`ANN` (also known as approximate nearest neighbor or similarity search)		✓	Use`query_vector` for a similarity search. Similarity search requires an embedding vector as a search target. Because an embedding model is not available in this scenario to calculate the embedding vector, you must provide it.
Hybrid	✓	✓	For a hybrid search for a specific string of text and a vector, specify both `query_text` and `query_vector`.

Returns

A table of the top matching records from the index. All the columns of the index are included.

Examples

The following sections show example SQL queries for different index searches.

Hybrid search queries

The following hybrid search example combines the following search types to find the provided terms in text or metadata of the vector search index:

Vector similarity search: To find similar semantic meaning for Wi-Fi issues.
Keyword search: To find Wi-Fi issues LMP-9R2 on a keyword index.

SQL
SELECT * FROM vector_search(
  index => 'main.support_docs.index',
  query_text => 'Wi-Fi issues LMP-9R2',
  query_type => 'HYBRID',
  num_results => 3)

doc_id	title	product_code
1403	Wi-Fi Troubleshooting Guide — LMP-9R2	LMP-9R2
1332	Known Connectivity Issues for LMP-9R2 Devices	LMP-9R2
1271	General Wi-Fi Troubleshooting Guide	LMP-8R2

The following hybrid search example specifies both query_text and query_vector for the term, Wi-Fi issues LMP-9R2. In this example, keyword search performs better on proprietary terms unique to a company (like “LMP-9R2” in this case), whereas vector search, which are typically trained on public datasets, does not recognize terms like “LMP-9R2.”

SQL

SELECT * FROM vector_search(
  index => 'main.support_docs.index',
  query_text => 'Wi-Fi issues LMP-9R2',
  query_vector => array( 0.0213, 0.1045, 0.0871, 0.0562, 0.1459, ... 0.0131),-- a self computed embedding of the `query_text` param
  query_type => 'HYBRID',
  num_results => 3 )

doc_id	title	product_code
1403	Wi-Fi Troubleshooting Guide — LMP-9R2	LMP-9R2
1332	Known Connectivity Issues for LMP-9R2 Devices	LMP-9R2
1271	General Wi-Fi Troubleshooting Guide	LMP-8R2

Text queries on indexes with embedding source columns

Search over an index of product SKUs to find similar products by name. The following example uses query_text which is only supported in Databricks Runtime 15.3 and above. For Databricks Runtime 15.2 and below, use query instead of query_text.

SQL

SELECT * FROM VECTOR_SEARCH(index => "main.db.my_index", query_text => "iphone", num_results => 2)

ID	Product name
10	iPhone
20	iPhone SE

The following example searches for multiple terms at the same time by using a LATERAL subquery.

SQL

SELECT
  query_txt,
  query_id,
  search.*
FROM
  query_table,
  LATERAL(
SELECT * FROM VECTOR_SEARCH(index => "main.db.my_index", query_text => query_txt, num_results => 2)
  ) as search

`query_txt`	query_id	search.id	search.product_name
iphone	1	10	iPhone 10
iphone	1	20	iPhone SE
pixel 8	2	30	Pixel 8
pixel 8	2	40	Pixel 8a

Text queries on indexes with embedding source columns

Search over an index of images with pre-computed embeddings to find similar images by embedding. The following example uses query_vector which is only supported in Databricks Runtime 15.3 and above. For Databricks Runtime 15.2 and below, use query instead of query_vector.

SQL

SELECT * FROM VECTOR_SEARCH(index => "main.db.my_index", query_vector => ARRAY(0.45, -0.35, 0.78, 0.22), num_results => 3)

SQL

SELECT * FROM VECTOR_SEARCH(index => "main.db.my_index", query_vector => ARRAY(0.45F, -0.35F, 0.78F, 0.22F), num_results => 3)

SQL

SELECT * FROM VECTOR_SEARCH(index => "main.db.my_index", query_vector => ARRAY(0.45D, -0.35D, 0.78D, 0.22D), num_results => 3)

id	image_name	image_embedding
28	horse.jpg	[0.46, -0.34, 0.77, 0.21]
27	donkey.jpg	[0.44, -0.36, 0.79, 0.23]
5	elk.jpg	[0.23, -0.44, 0.77, 0.80]

The following example searches for multiple terms at the same time by using a LATERAL subquery.

SQL

SELECT
  query_embedding,
  search.*
FROM
  query_table,
  LATERAL(
SELECT * FROM VECTOR_SEARCH(index => "main.db.my_index", query_vector => image_embedding, num_results => 1)
  ) as search

query_embedding	search.id	search.image_name	search_image_embedding
[0.45, -0.35, 0.78, 0.22]	27	donkey.jpg	[0.46, -0.34, 0.77, 0.21]
[0.14, 0.29, 0.30, -0.90]	3	landscape.jpg	[0.15, 0.30, 0.31, -0.91]
[0.23, -0.44, 0.77, 0.80]	10	golden_gate_bridge.jpg	[0.28, -0.40, 0.23, 0.81]
[0.88, 0.88, 0.88, 0.88]	44	blank.jpg	[0.88, 0.88, 0.88, 0.88]

Limitations

The following limitations apply during the preview:

Querying DIRECT_ACCESS index types are not supported.
Input parameters filters_json or columns are not supported.
Vector Search with num_results greater than 100 are not supported.
vector_search cannot be used with model serving endpoints using Foundation Model APIs provisioned throughput.

Requirements​

Syntax​

Arguments​

Returns​

Examples​

Hybrid search queries​

Text queries on indexes with embedding source columns​

Text queries on indexes with embedding source columns​

Limitations​

Requirements

Syntax

Arguments

Returns

Examples

Hybrid search queries

Text queries on indexes with embedding source columns

Text queries on indexes with embedding source columns

Limitations