vector_sum aggregate function
Applies to: Databricks Runtime 18.1 and above
Computes the element-wise sum of vectors in an aggregate. Returns a vector where each element is the sum of the corresponding elements across all input vectors.
Syntax
vector_sum(vectors) [FILTER ( WHERE cond ) ]
Arguments
- vectors: A column of
ARRAY<FLOAT>expressions representing vectors. All vectors must have the same dimension. - cond: An optional boolean expression filtering the rows used for aggregation.
Returns
An ARRAY<FLOAT> value with the same dimension as the input vectors. Each element in the result is the sum of the corresponding elements across all input vectors.
NULL values and non-NULL vectors containing a NULL element are ignored in the aggregation. Returns NULL if all values in the group are invalid (NULL or non-NULL vectors with NULL elements). Returns an empty array [] if all input vectors are empty.
Notes
- Only
ARRAY<FLOAT>is supported; other types such asARRAY<DOUBLE>orARRAY<DECIMAL>raise an error. - All input vectors must have the same dimension; otherwise the function raises VECTOR_DIMENSION_MISMATCH.
- A non-
NULLvector that contains aNULLelement is treated asNULL.
Error conditions
Examples
SQL
-- Element-wise sum per category (with GROUP BY)
> SELECT category, vector_sum(embedding) AS sum_vector
FROM vector_data
GROUP BY category
ORDER BY category;
category: A, sum_vector: [5.0, 7.0, 9.0]
category: B, sum_vector: [5.0, 3.0, 5.0]
-- Scalar aggregation (without GROUP BY)
> SELECT vector_sum(embedding) AS total_sum FROM vector_data;
total_sum: [10.0, 10.0, 14.0]