Pular para o conteúdo principal

vector_avg aggregate function

Applies to: check marked yes Databricks Runtime 18.1 and above

Computes the element-wise average of vectors in an aggregate. Returns a vector where each element is the arithmetic mean of the corresponding elements across all input vectors.

Syntax

vector_avg(vectors) [FILTER ( WHERE cond ) ]

Arguments

  • vectors: A column of ARRAY<FLOAT> expressions representing vectors. All vectors must have the same dimension.
  • cond: An optional boolean expression filtering the rows used for aggregation.

Returns

An ARRAY<FLOAT> value with the same dimension as the input vectors. Each element in the result is the average of the corresponding elements across all input vectors.

NULL values and non-NULL vectors containing a NULL element are ignored in the aggregation. Returns NULL if all values in the group are invalid. Returns an empty array [] if all input vectors are empty.

Notes

  • Only ARRAY<FLOAT> is supported; other types such as ARRAY<DOUBLE> or ARRAY<DECIMAL> raise an error.
  • All input vectors must have the same dimension; otherwise the function raises VECTOR_DIMENSION_MISMATCH.
  • A non-NULL vector that contains a NULL element is treated as NULL.

Error conditions

Examples

SQL
-- Element-wise average per category (with GROUP BY)
> SELECT category, vector_avg(embedding) AS centroid
FROM vector_data
GROUP BY category
ORDER BY category;
category: A, centroid: [3.0, 6.0, 9.0]
category: B, centroid: [2.0, 4.0, 6.0]

-- Scalar aggregation (no GROUP BY)
> SELECT vector_avg(embedding) AS overall_centroid FROM vector_data;
overall_centroid: [2.5, 5.0, 7.5]