メインコンテンツまでスキップ

vector_cosine_similarity function

Applies to: check marked yes Databricks Runtime 18.1 and above

Computes the cosine similarity between two vectors, measuring the cosine of the angle between them.

Syntax

vector_cosine_similarity(vector1, vector2)

Arguments

  • vector1: An ARRAY<FLOAT> expression representing the first vector.
  • vector2: An ARRAY<FLOAT> expression representing the second vector.

Returns

A FLOAT value representing the cosine similarity between the two vectors. The result ranges from -1.0 (opposite directions) to 1.0 (same direction), where 0.0 indicates orthogonality.

Returns NULL for empty vectors, if either vector has zero magnitude, or if either input is NULL or contains NULL.

Notes

  • Only ARRAY<FLOAT> is supported; other types such as ARRAY<DOUBLE> or ARRAY<DECIMAL> raise an error.
  • Both vectors must have the same dimension; otherwise the function raises VECTOR_DIMENSION_MISMATCH.
  • Higher values indicate greater similarity; commonly used for semantic similarity in embedding spaces.

Error conditions

Examples

SQL
-- Basic cosine similarity
> SELECT vector_cosine_similarity(array(1.0f, 2.0f, 3.0f), array(4.0f, 5.0f, 6.0f));
0.9746318461970762

-- Identical vectors (maximum similarity)
> SELECT vector_cosine_similarity(array(1.0f, 0.0f, 0.0f), array(1.0f, 0.0f, 0.0f));
1.0

-- Orthogonal vectors
> SELECT vector_cosine_similarity(array(1.0f, 0.0f), array(0.0f, 1.0f));
0.0