Mosaic AI Vector Search: Cost management guide

This article describes how to effectively manage your costs when using Mosaic AI Vector Search. It covers the following topics:

Vector search index and endpoint basics.
Billing and usage monitoring.
Sync modes.
Best practices to optimize costs.

To identify and delete empty endpoints, see Identify and delete empty Vector Search endpoints.

Mosaic AI Vector Search basics

Mosaic AI Vector Search is composed of:

Vector search indices: Indexes store your vectors for search and retrieval.
Vector search endpoints: Each endpoint hosts one or more indices for serving queries. You can have multiple indices served under a single endpoint, and one endpoint can serve up to 50 indices. In many cases, you can combine smaller workloads on a single endpoint to lower total costs.

How vector search is priced

Databricks offers two endpoint options:

Standard endpoints. One vector search unit covers up to 2 million vectors of dimension 768 (or the equivalent). For example, if you have 1 million vectors of dimension 1536, that also counts as one unit.
Storage-optimized endpoints. One vector search unit covers up to 64 million vectors of dimension 768 (or the equivalent).

For both options, each endpoint has a base price and scales up automatically to match the total size of the indices it is serving.

Standard endpoints do not scale down automatically. Even if you delete vectors or reduce the size of your indices, you continue paying for the higher capacity until you make changes manually.
Storage-optimized endpoints scale down automatically when an index is deleted. The minimum size for an endpoint is one vector search unit.

important

Standard endpoints do not scale down automatically. If your vector count drops significantly (for example, from 4 million to 1.5 million vectors), you continue to pay for the higher capacity (two vector search units in this example) until you delete the endpoint and create a new one. This is only true for standard endpoints. Storage-optimized endpoints scale down automatically.

How to monitor usage and costs

Databricks provides a billable usage table, usage dashboards, and budget policies to help you monitor the usage and costs for Vector Search.

Billable usage table

Here is an example query of the billable usage table:

SQL
WITH all_vector_search_usage AS (
  SELECT *,
         CASE WHEN usage_metadata.endpoint_name IS NULL THEN 'ingest'
              WHEN usage_type = "STORAGE_SPACE" THEN 'storage'
              ELSE 'serving'
         END as workload_type
    FROM system.billing.usage
   WHERE billing_origin_product = 'VECTOR_SEARCH'
),

daily_dbus AS (
  SELECT
    workspace_id,
    cloud,
    usage_date,
    workload_type,
    usage_metadata.endpoint_name as vector_search_endpoint,
    CASE WHEN workload_type = 'serving' THEN SUM(usage_quantity)
         WHEN workload_type = 'ingest' THEN SUM(usage_quantity)
         ELSE null
         END as dbus,
    CASE WHEN workload_type = 'storage' THEN SUM(usage_quantity)
         ELSE null
         END as dsus
  FROM all_vector_search_usage
  GROUP BY 1,2,3,4,5
  ORDER BY 1,2,3,4,5 DESC
)
SELECT * FROM daily_dbus;

For more details on the billable usage table, see Billable usage system table reference.

Additional queries are in the following example notebook.

Vector search system tables queries notebook

Open notebook in new tab

Usage dashboards

For information about usage dashboards that you can import to gain insights into cost drivers, including usage for vector search, see Usage dashboards.

Budget policies

Budget policies enable administrators to group and filter billing records across all Databricks serverless products, and provide a dedicated UI for tracking spending. To learn how to apply a budget policy to a vector search endpoint, see Mosaic AI Vector Search: Budget policies. For general information and details about how to create and manage budget policies, see Attribute usage with serverless budget policies.

How to manage index sync costs

You can configure your index to update in two ways:

Triggered Sync: You call the API or Python SDK to trigger an index update. This is the most cost-effective option.
Continuous Sync: The index is automatically updated with changes from the source Delta table with near real-time latency. This costs more because a streaming cluster is provisioned to handle the sync. If near real-time updates with seconds of latency are not critical, consider using Triggered Sync to reduce costs.

Best practices for cost management

Combine workloads on a single endpoint: If you anticipate low QPS across all indices, you can combine your indices under a single endpoint to avoid multiple base endpoint costs. See Vector Search performance guide for more details.
Monitor usage: Use the system billing tables and built-in usage dashboards to track capacity, usage, and costs.
For standard endpoints, scale down manually: As explained above, for standard endpoints, you must delete the endpoint and recreate it if your vector count falls below a previous capacity threshold you no longer need. Storage-optimized endpoints scale down automatically when an index is deleted.
Choose the right sync mode: Use Triggered Sync instead of Continuous Sync where possible, to reduce streaming costs.
Identify and delete empty endpoints: See Identify and delete empty Vector Search endpoints.

Additional Resources

Mosaic AI Vector Search pricing
Usage dashboards and instructions
Contact your Databricks account team if you would like additional guidance on forecasting your usage or leveraging cost estimation tools specific to your workloads.

Mosaic AI Vector Search basics​

How vector search is priced​

How to monitor usage and costs​

Billable usage table​

Vector search system tables queries notebook

Usage dashboards​

Budget policies​

How to manage index sync costs​

Best practices for cost management​

Additional Resources​