Skip to main content

levenshtein

Computes the Levenshtein distance of the two given strings.

For the corresponding Databricks SQL function, see levenshtein function.

Syntax

Python
from pyspark.databricks.sql import functions as dbf

dbf.levenshtein(left=<left>, right=<right>, threshold=<threshold>)

Parameters

Parameter

Type

Description

left

pyspark.sql.Column or str

First column value.

right

pyspark.sql.Column or str

Second column value.

threshold

int, optional

If set when the levenshtein distance of the two given strings less than or equal to a given threshold then return result distance, or -1

Returns

pyspark.sql.Column: Levenshtein distance as integer value.

Examples

Python
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([('kitten', 'sitting',)], ['l', 'r'])
df.select('*', dbf.levenshtein('l', 'r')).show()
Python
df.select('*', dbf.levenshtein(df.l, df.r, 2)).show()