vector_cosine_similarity function
Applies to: Databricks Runtime 18.1 and above
Computes the cosine similarity between two vectors, measuring the cosine of the angle between them.
Syntax
vector_cosine_similarity(vector1, vector2)
Arguments
- vector1: An
ARRAY<FLOAT>expression representing the first vector. - vector2: An
ARRAY<FLOAT>expression representing the second vector.
Returns
A FLOAT value representing the cosine similarity between the two vectors. The result ranges from -1.0 (opposite directions) to 1.0 (same direction), where 0.0 indicates orthogonality.
Returns NULL for empty vectors, if either vector has zero magnitude, or if either input is NULL or contains NULL.
Notes
- Only
ARRAY<FLOAT>is supported; other types such asARRAY<DOUBLE>orARRAY<DECIMAL>raise an error. - Both vectors must have the same dimension; otherwise the function raises VECTOR_DIMENSION_MISMATCH.
- Higher values indicate greater similarity; commonly used for semantic similarity in embedding spaces.
Error conditions
Examples
SQL
-- Basic cosine similarity
> SELECT vector_cosine_similarity(array(1.0f, 2.0f, 3.0f), array(4.0f, 5.0f, 6.0f));
0.9746318461970762
-- Identical vectors (maximum similarity)
> SELECT vector_cosine_similarity(array(1.0f, 0.0f, 0.0f), array(1.0f, 0.0f, 0.0f));
1.0
-- Orthogonal vectors
> SELECT vector_cosine_similarity(array(1.0f, 0.0f), array(0.0f, 1.0f));
0.0