Skip to main content

vector_normalize

Normalizes a float vector to unit length using the specified norm degree. Degree defaults to 2.0 (Euclidean norm) if unspecified.

For the corresponding Databricks SQL function, see vector_normalize function.

Syntax

Python
from pyspark.sql import functions as dbf

dbf.vector_normalize(vector=<vector>, degree=<degree>)

Parameters

Parameter

Type

Description

vector

pyspark.sql.Column or column name

Input vector column.

degree

pyspark.sql.Column or column name, optional

Norm degree (1.0 for L1, 2.0 for L2, float('inf') for infinity norm). Defaults to 2.0.

Returns

pyspark.sql.Column: The normalized vector as an array of floats.

Examples

Python
from pyspark.sql import functions as dbf
from pyspark.sql.types import ArrayType, FloatType, StructType, StructField

schema = StructType([StructField('v', ArrayType(FloatType()))])
df = spark.createDataFrame([([3.0, 4.0],)], schema)
df.select(dbf.vector_normalize('v', dbf.lit(2.0).cast('float'))).first()[0]
# [0.6..., 0.8...]