Best practices for Mosaic AI Vector Search

This article gives some tips for how to use Mosaic AI Vector Search most effectively.

Recommendations for optimizing latency

Use the service principal authorization flow to take advantage of network-optimized routes. Service principal authorization can improve per-query performance by up to 100 msec when compared to personal access tokens.
Use the latest version of the Python SDK.
When testing, start with a concurrency of around 16 to 32. Higher concurrency does not yield a higher throughput.
Use a model served with provisioned throughput instead of a pay-per-token foundation model.
Make sure you get the index only once, not on every query. Calling client.get_index(...).similarity_search(...) has increased latency. Instead, use the following:
Python
```
# Initialize index
index = client.get_index(...)

# Then later, for every query
index.similarity_search(...)
```
This is important when using the vector search index in MLFlow environments, where you can create the index object when you create the endpoint, and then reuse it for every query.

Pre-compute the embeddings and use a Delta Sync Index with self-managed embeddings.
Don't store binary formats such as images as metadata, as this adversely affects latency. Instead, store the path of the file as metadata.

Check the embedding model sequence length to make sure documents are not being truncated.

The most cost-effective option for updating a vector search index is Triggered. Only select Continuous if you need to incrementally sync the index to changes in the source table with a latency of seconds. Both sync modes perform incremental updates – only data that has changed since the last sync is processed.
For additional information about managing costs when using vector search, see Mosaic AI Vector Search: Cost management guide.