vector_normalize
Normalizes a float vector to unit length using the specified norm degree. Degree defaults to 2.0 (Euclidean norm) if unspecified.
For the corresponding Databricks SQL function, see vector_normalize function.
Syntax
Python
from pyspark.sql import functions as dbf
dbf.vector_normalize(vector=<vector>, degree=<degree>)
Parameters
Parameter | Type | Description |
|---|---|---|
|
| Input vector column. |
|
| Norm degree ( |
Returns
pyspark.sql.Column: The normalized vector as an array of floats.
Examples
Python
from pyspark.sql import functions as dbf
from pyspark.sql.types import ArrayType, FloatType, StructType, StructField
schema = StructType([StructField('v', ArrayType(FloatType()))])
df = spark.createDataFrame([([3.0, 4.0],)], schema)
df.select(dbf.vector_normalize('v', dbf.lit(2.0).cast('float'))).first()[0]
# [0.6..., 0.8...]