Skip to main content

xxhash64

Calculates the hash code of given columns using the 64-bit variant of the xxHash algorithm, and returns the result as a long column. The hash computation uses an initial seed of 42. Supports Spark Connect.

For the corresponding Databricks SQL function, see xxhash64 function.

Syntax

Python
from pyspark.databricks.sql import functions as dbf

dbf.xxhash64(*cols)

Parameters

Parameter

Type

Description

cols

pyspark.sql.Column or str

One or more columns to compute on.

Returns

pyspark.sql.Column: hash value as long column.

Examples

Example 1: Computing xxhash64 of a single column

Python
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([('ABC', 'DEF')], ['c1', 'c2'])
df.select('*', dbf.xxhash64('c1')).show()
Output
+---+---+-------------------+
| c1| c2| xxhash64(c1)|
+---+---+-------------------+
|ABC|DEF|4105715581806190027|
+---+---+-------------------+

Example 2: Computing xxhash64 of multiple columns

Python
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([('ABC', 'DEF')], ['c1', 'c2'])
df.select('*', dbf.xxhash64('c1', df.c2)).show()
Output
+---+---+-------------------+
| c1| c2| xxhash64(c1, c2)|
+---+---+-------------------+
|ABC|DEF|3233247871021311208|
+---+---+-------------------+