Mosaic AI Vector Search: Cost management guide
This feature is in Public Preview.
This article describes how to effectively manage your costs when using Mosaic AI Vector Search. It covers the following topics:
- Vector search index and endpoint basics.
- Billing and usage monitoring.
- Sync modes.
- Best practices to optimize costs.
Mosaic AI Vector Search basics
Mosaic AI Vector Search is composed of:
- Vector search indices: Indexes store your vectors for search and retrieval.
- Vector search endpoints: Each endpoint hosts one or more indices for serving queries. You can have multiple indices served under a single endpoint, and one endpoint can serve up to 50 indices. In many cases, you can combine smaller workloads on a single endpoint to lower total costs.
How vector search is priced
One vector search unit covers up to 2 million vectors of dimension 768 (or the equivalent). For example, if you have 1 million vectors of dimension 1536, that also counts as one unit. Each endpoint has a base price and scales up automatically to match the total size of the indices it is serving. However, endpoints do not scale down automatically. Even if you delete vectors or reduce the size of your indices, you continue paying for the higher capacity until you take manual action.
Endpoints do not scale down automatically. If your vector count drops significantly (for example, from 4 million to 1.5 million vectors), you continue to pay for the higher capacity (two vector search units in this example) until you delete the endpoint and create a new one.
How to monitor usage and costs
Databricks provides a billable usage table, usage dashboards, and budget policies to help you monitor the usage and costs for Vector Search.
Billable usage table
Here is an example query of the billable usage table:
WITH all_vector_search_usage AS (
SELECT *,
CASE WHEN usage_metadata.endpoint_name IS NULL THEN 'ingest'
WHEN usage_type = "STORAGE_SPACE" THEN 'storage'
ELSE 'serving'
END as workload_type
FROM system.billing.usage
WHERE billing_origin_product = 'VECTOR_SEARCH'
),
daily_dbus AS (
SELECT
workspace_id,
cloud,
usage_date,
workload_type,
usage_metadata.endpoint_name as vector_search_endpoint,
CASE WHEN workload_type = 'serving' THEN SUM(usage_quantity)
WHEN workload_type = 'ingest' THEN SUM(usage_quantity)
ELSE null
END as dbus,
CASE WHEN workload_type = 'storage' THEN SUM(usage_quantity)
ELSE null
END as dsus
FROM all_vector_search_usage
GROUP BY 1,2,3,4,5
ORDER BY 1,2,3,4,5 DESC
)
SELECT * FROM daily_dbus;
For more details on the billable usage table, see Billable usage system table reference.
Additional queries are in the following example notebook.
Vector search system tables queries notebook
Usage dashboards
For information about usage dashboards that you can import to gain insights into cost drivers, including usage for vector search, see Usage dashboards.
Budget policies
Budget policies enable administrators to group and filter billing records across all Databricks serverless products, and provide a dedicated UI for tracking spending. To learn how to apply a budget policy to a vector search endpoint, see Mosaic AI Vector Search: Budget policies. For general information and details about how to create and manage budget policies, see Attribute usage with serverless budget policies.
How to manage index sync costs
You can configure your index to update in two ways:
- Triggered Sync: You call the API or Python SDK to trigger an index update. This is the most cost-effective option.
- Continuous Sync: The index is automatically updated with changes from the source Delta table with near real-time latency. This costs more because a streaming cluster is provisioned to handle the sync. If near real-time updates with seconds of latency are not critical, consider using Triggered Sync to reduce costs.
Best practices for cost management
- Combine workloads on a single endpoint: If you anticipate under ~150 QPS across all indices, you can combine your indices under a single endpoint to avoid multiple base endpoint costs.
- Monitor usage: Use the system billing tables and built-in usage dashboards to track capacity, usage, and costs.
- Scale down manually: As explained above, you must delete the endpoint and recreate it if your vector count falls below a previous capacity threshold you no longer need.
- Choose the right sync mode: Use Triggered Sync instead of Continuous Sync where possible, to reduce streaming costs.
Additional Resources
- Mosaic AI Vector Search pricing
- Usage dashboards and instructions
- Contact your Databricks account team if you would like additional guidance on forecasting your usage or leveraging cost estimation tools specific to your workloads.