Skip to main content

Lakebase Search

Beta

This feature is in Beta. Workspace admins can control access to this feature from the Previews page. See Manage Databricks previews.

Lakebase Search adds hybrid vector and keyword search to Lakebase Autoscaling projects. Enable it once in your project settings, then install the lakebase_vector and lakebase_text Postgres extensions to start building search features.

How it works

Lakebase Search is built on two Postgres extensions:

  • lakebase_vector adds approximate nearest-neighbor (ANN) vector search via the lakebase_ann index type. It is a drop-in companion to pgvector: the same vector types, distance operators, and query syntax work without modification. Internally, it uses IVF partitioning with RaBitQ quantization, which supports indexes over 1 billion vectors on a single index and builds up to 50-100x faster than HNSW. Indexes are storage-backed and survive scale-to-zero without warmup.

  • lakebase_text adds BM25 full-text search via the lakebase_bm25 index type. It is compatible with PostgreSQL's standard tsvector types and query operators. BM25 ranking accounts for term frequency, document length, and corpus-wide statistics simultaneously. Top-K pushdown (Block-Max WAND) retrieves only the K most relevant results from the index instead of scoring every match.

Requirements

  • Postgres 16 or later
  • Enabling Lakebase Search on a project is irreversible
  1. In your Lakebase project settings, click Lakebase in the left navigation.
  2. Click Enable Lakebase Search.
warning

Enabling Lakebase Search:

  • Restarts all computes in your project, dropping any active connections
  • Makes the lakebase_vector and lakebase_text extensions available to install
  • Cannot be turned off once enabled

Install extensions

After enabling Lakebase Search, install the extensions in your database:

PostgreSQL
-- Required: vector search (CASCADE installs pgvector as a dependency)
CREATE EXTENSION IF NOT EXISTS lakebase_vector CASCADE;

-- Required: BM25 full-text search
CREATE EXTENSION IF NOT EXISTS lakebase_text;

Get started

The following example creates a documents table with both a vector column and a full-text search column, then runs vector and keyword queries:

PostgreSQL
-- Create a table with a vector column and a tsvector column
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
body TEXT NOT NULL,
embedding VECTOR(3),
body_tsv TSVECTOR
);

-- Create a vector search index
CREATE INDEX ON documents USING lakebase_ann (embedding vector_cosine_ops);

-- Insert sample data and populate the tsvector column
INSERT INTO documents (title, body, embedding, body_tsv) VALUES
('Postgres overview', 'Postgres is an open-source relational database.', '[0.1, 0.2, 0.3]', to_tsvector('english', 'Postgres is an open-source relational database.')),
('Vector search guide', 'Vector search finds semantically similar results.', '[0.4, 0.5, 0.6]', to_tsvector('english', 'Vector search finds semantically similar results.')),
('Full-text search', 'BM25 ranking improves keyword search relevance.', '[0.7, 0.8, 0.9]', to_tsvector('english', 'BM25 ranking improves keyword search relevance.'));

-- Build the BM25 index after inserting data
-- BM25 computes corpus statistics at build time, not incrementally
CREATE INDEX documents_body_bm25 ON documents USING lakebase_bm25 (body_tsv);

-- Vector similarity search
SELECT id, title
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, 0.3]'
LIMIT 5;

-- BM25 keyword search (lower score = more relevant)
SELECT id, title,
body_tsv <@> to_bm25query(to_tsvector('english', 'database'), 'documents_body_bm25') AS score
FROM documents
ORDER BY score
LIMIT 5;

Extensions

Extension

Purpose

Index type

lakebase_vector

ANN vector search, pgvector-compatible

lakebase_ann

lakebase_text

BM25 full-text search, FTS-compatible

lakebase_bm25