Query a vector search index
This article describes how to query a vector search index, including how to use filters and reranking.
For example notebooks illustrating how to create and query vector search endpoints and indexes, see Vector search example notebooks. For reference information, see the Python SDK reference.
Installation
To use the vector search SDK, you must install it in your notebook. Use the following code to install the package:
%pip install databricks-vectorsearch
dbutils.library.restartPython()
Then use the following command to import VectorSearchClient:
from databricks.vector_search.client import VectorSearchClient
For information about authentication, see Data protection and authentication.
How to query a vector search index
You can only query the vector search index using the Python SDK, the REST API, or the SQL vector_search() AI function.
If the user querying the index is not the owner of the vector search index, the user must have the following UC privileges:
- USE CATALOG on the catalog that contains the vector search index.
- USE SCHEMA on the schema that contains the vector search index.
- SELECT on the vector search index.
The default query type is ann (approximate nearest neighbor). To perform a hybrid keyword-similarity search, set the parameter query_type to hybrid. With hybrid search, all text metadata columns are included, and a maximum of 200 results are returned.
To use the reranker in a query, see Use the reranker in a query.
Full-text search is available as a beta feature. To perform a full-text search, set the parameter query_type to FULL_TEXT. With full-text search, you can retrieve up to 200 results based on keyword matching without using vector embeddings.
- Python SDK standard endpoint
- Python SDK storage-optimized endpoint
- REST API
- SQL
For details, see the Python SDK reference.
# Delta Sync Index with embeddings computed by Databricks
results = index.similarity_search(
query_text="Greek myths",
columns=["id", "field2"],
num_results=2
)
# Delta Sync Index using hybrid search, with embeddings computed by Databricks
results3 = index.similarity_search(
query_text="Greek myths",
columns=["id", "field2"],
num_results=2,
query_type="hybrid"
)
# Delta Sync Index using full-text search (Beta)
results4 = index.similarity_search(
query_text="Greek myths",
columns=["id", "field2"],
num_results=2,
query_type="FULL_TEXT"
)
# Delta Sync Index with pre-calculated embeddings
results2 = index.similarity_search(
query_vector=[0.9] * 1024,
columns=["id", "text"],
num_results=2
)
For details, see the Python SDK reference.
The existing filter interface has been re-designed for storage-optimized vector search indexes to adopt a more SQL-like filter string instead of the filter dictionary used in standard vector search endpoints.
client = VectorSearchClient()
index = client.get_index(index_name="vector_search_demo.vector_search.en_wiki_index")
# similarity search with query vector
results = index.similarity_search(
query_vector=[0.2, 0.33, 0.19, 0.52],
columns=["id", "text"],
num_results=2
)
# similarity search with query vector and filter string
results = index.similarity_search(
query_vector=[0.2, 0.33, 0.19, 0.52],
columns=["id", "text"],
# this is a single filter string similar to SQL WHERE clause syntax
filters="language = 'en' AND country = 'us'",
num_results=2
)
See the REST API reference documentation: POST /api/2.0/vector-search/indexes/{index_name}/query.
For production applications, Databricks recommends using service principals instead of personal access tokens. In addition to improved security and access management, using service principals can improve performance by up to 100 msec per query.
The following code example illustrates how to query an index using a service principal.
export SP_CLIENT_ID=...
export SP_CLIENT_SECRET=...
export INDEX_NAME=...
export WORKSPACE_URL=https://...
export WORKSPACE_ID=...
# Set authorization details to generate OAuth token
export AUTHORIZATION_DETAILS='{"type":"unity_catalog_permission","securable_type":"table","securable_object_name":"'"$INDEX_NAME"'","operation": "ReadVectorIndex"}'
# If you are using an route_optimized embedding model endpoint, then you need to have additional authorization details to invoke the serving endpoint
# export EMBEDDING_MODEL_SERVING_ENDPOINT_ID=...
# export AUTHORIZATION_DETAILS="$AUTHORIZATION_DETAILS"',{"type":"workspace_permission","object_type":"serving-endpoints","object_path":"/serving-endpoints/'"$EMBEDDING_MODEL_SERVING_ENDPOINT_ID"'","actions": ["query_inference_endpoint"]}'
# Generate OAuth token
export TOKEN=$(curl -X POST --url $WORKSPACE_URL/oidc/v1/token -u "$SP_CLIENT_ID:$SP_CLIENT_SECRET" --data 'grant_type=client_credentials' --data 'scope=all-apis' --data-urlencode 'authorization_details=['"$AUTHORIZATION_DETAILS"']' | jq .access_token | tr -d '"')
# Get index URL
export INDEX_URL=$(curl -X GET -H 'Content-Type: application/json' -H "Authorization: Bearer $TOKEN" --url $WORKSPACE_URL/api/2.0/vector-search/indexes/$INDEX_NAME | jq -r '.status.index_url' | tr -d '"')
# Query vector search index.
curl -X GET -H 'Content-Type: application/json' -H "Authorization: Bearer $TOKEN" --url https://$INDEX_URL/query --data '{"num_results": 3, "query_vector": [...], "columns": [...], "debug_level": 1}'
# Query vector search index.
curl -X GET -H 'Content-Type: application/json' -H "Authorization: Bearer $TOKEN" --url https://$INDEX_URL/query --data '{"num_results": 3, "query_text": "...", "columns": [...], "debug_level": 1}'
The following code example illustrates how to query an index using a personal access token (PAT).
export TOKEN=...
export INDEX_NAME=...
export WORKSPACE_URL=https://...
# Query vector search index with `query_vector`
curl -X GET -H 'Content-Type: application/json' -H "Authorization: Bearer $TOKEN" --url $WORKSPACE_URL/api/2.0/vector-search/indexes/$INDEX_NAME/query --data '{"num_results": 3, "query_vector": [...], "columns": [...], "debug_level": 1}'
# Query vector search index with `query_text`
curl -X GET -H 'Content-Type: application/json' -H "Authorization: Bearer $TOKEN" --url $WORKSPACE_URL/api/2.0/vector-search/indexes/$INDEX_NAME/query --data '{"num_results": 3, "query_text": "...", "columns": [...], "debug_level": 1}'
The vector_search() AI function is in Public Preview.
To use this AI function, see vector_search function.
Use filters on queries
A query can define filters based on any column in the Delta table. similarity_search returns only rows that match the specified filters.
The following table lists the supported filters.
Filter operator | Behavior | Examples |
|---|---|---|
| Standard: Negates the filter. The key must end with “NOT”. For example, “color NOT” with value “red” matches documents where the color is not red. Storage-optimized: See | Standard: Storage-optimized: |
| Standard: Checks if the field value is less than the filter value. The key must end with ” <”. For example, “price <” with value 200 matches documents where the price is less than 200. Storage-optimized: See | Standard: Storage-optimized: |
| Standard: Checks if the field value is less than or equal to the filter value. The key must end with ” <=”. For example, “price <=” with value 200 matches documents where the price is less than or equal to 200. Storage-optimized: See | Standard: Storage-optimized: |
| Standard: Checks if the field value is greater than the filter value. The key must end with ” >”. For example, “price >” with value 200 matches documents where the price is greater than 200. Storage-optimized: See | Standard: Storage-optimized: |
| Standard: Checks if the field value is greater than or equal to the filter value. The key must end with ” >=”. For example, “price >=” with value 200 matches documents where the price is greater than or equal to 200. Storage-optimized: See | Standard: Storage-optimized: |
| Standard: Checks if the field value matches any of the filter values. The key must contain Storage-optimized: See | Standard: Storage-optimized: |
| Standard: Matches whitespace-separated tokens in a string. See the code examples below. Storage-optimized: See | Standard: Storage-optimized: |
No filter operator specified | Standard: Filter checks for an exact match. If multiple values are specified, it matches any of the values. Storage-optimized: See | Standard: Storage-optimized: |
| Storage-optimized: Filter on a timestamp. See | Storage-optimized: |
See the following code examples:
- Python SDK standard endpoint
- Python SDK storage-optimized endpoint
- REST API
- LIKE
# Match rows where `title` exactly matches `Athena` or `Ares`
results = index.similarity_search(
query_text="Greek myths",
columns=["id", "text"],
filters={"title": ["Ares", "Athena"]},
num_results=2
)
# Match rows where `title` or `id` exactly matches `Athena` or `Ares`
results = index.similarity_search(
query_text="Greek myths",
columns=["id", "text"],
filters={"title OR id": ["Ares", "Athena"]},
num_results=2
)
# Match only rows where `title` is not `Hercules`
results = index.similarity_search(
query_text="Greek myths",
columns=["id", "text"],
filters={"title NOT": "Hercules"},
num_results=2
)
# Match rows where `title` exactly matches `Athena` or `Ares`
results = index.similarity_search(
query_text="Greek myths",
columns=["id", "text"],
filters='title IN ("Ares", "Athena")',
num_results=2
)
# Match rows where `title` or `id` exactly matches `Athena` or `Ares`
results = index.similarity_search(
query_text="Greek myths",
columns=["id", "text"],
filters='title = "Ares" OR id = "Athena"',
num_results=2
)
# Match only rows where `title` is not `Hercules`
results = index.similarity_search(
query_text="Greek myths",
columns=["id", "text"],
filters='title != "Hercules"',
num_results=2
)
LIKE examples
{"column LIKE": "apple"}: matches the strings "apple" and "apple pear" but does not match "pineapple" or "pear". Note that it does not match "pineapple" even though it contains a substring "apple" --- it looks for an exact match over whitespace separated tokens like in "apple pear".
{"column NOT LIKE": "apple"} does the opposite. It matches "pineapple" and "pear" but does not match "apple" or "apple pear".
Use the reranker in a query
This feature is in Public Preview.
Agent performance depends on retrieving the most relevant information for a query. Reranking is a technique that improves retrieval quality by evaluating the retrieved documents to identify the ones that are semantically most relevant. Databricks has developed a research-based compound AI system to identify these documents. You can also specify columns containing metadata that you want the reranker to use for additional context as it assesses each document's relevance.
Reranking incurs a small latency delay but can significantly improve retrieval quality and agent performance. Databricks recommends trying out reranking for any RAG agent use case.
The examples in this section show how to use the vector search reranker. When you use the reranker, you set the columns to return (columns) and the metadata columns to use for reranking (columns_to_rerank) separately. num_results is the final number of results to return. This does not affect the number of results used for reranking.
The query debug message includes information about how long the reranking step took. For example:
'debug_info': {'response_time': 1647.0, 'ann_time': 29.0, 'reranker_time': 1573.0}
If the reranker call fails, that information is included in the debug message:
'debug_info': {'response_time': 587.0, 'ann_time': 331.0, 'reranker_time': 246.0, 'warnings': [{'status_code': 'RERANKER_TEMPORARILY_UNAVAILABLE', 'message': 'The reranker is temporarily unavailable. Results returned have not been processed by the reranker. Please try again later for reranked results.'}]}
The order that columns are listed in columns_to_rerank is important. The reranking calculation takes the columns in the order they are listed, and considers only the first 2000 characters it finds.
- Python SDK
- REST API
# Install the most recent version.
# Databricks SDK version 0.57 or above is required to use the reranker.
%pip install databricks-vectorsearch --force-reinstall
dbutils.library.restartPython()
from databricks.vector_search.reranker import DatabricksReranker
results = index.similarity_search(
query_text = "How to create a Vector Search index",
columns = ["id", "text", "parent_doc_summary", "date"],
num_results = 10,
query_type = "hybrid",
reranker=DatabricksReranker(columns_to_rerank=["text", "parent_doc_summary", "other_column"])
)
To ensure that you get latency information, set debug_level to at least 1.
export TOKEN=...
export INDEX_NAME=...
export WORKSPACE_URL=https://...
curl -X GET -H 'Content-Type: application/json' -H "Authorization: Bearer $TOKEN" --url $WORKSPACE_URL/api/2.0/vector-search/indexes/$INDEX_NAME/query --data '{"num_results": 10, "query_text": "How to create a Vector Search index", "columns": ["id", "text", "parent_doc_summary", "date"], "reranker": {"model": "databricks_reranker",
"parameters": {
"columns_to_rerank":
["text", "parent_doc_summary"]
}
},
"debug_level": 1}'