Skip to main content

hash

Calculates the hash code of given columns, and returns the result as an int column. Supports Spark Connect.

For the corresponding Databricks SQL function, see hash function.

Syntax

Python
from pyspark.databricks.sql import functions as dbf

dbf.hash(*cols)

Parameters

Parameter

Type

Description

cols

pyspark.sql.Column or str

One or more columns to compute on.

Returns

pyspark.sql.Column: hash value as int column.

Examples

Example 1: Computing hash of a single column

Python
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([('ABC', 'DEF')], ['c1', 'c2'])
df.select('*', dbf.hash('c1')).show()
Output
+---+---+----------+
| c1| c2| hash(c1)|
+---+---+----------+
|ABC|DEF|-757602832|
+---+---+----------+

Example 2: Computing hash of multiple columns

Python
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([('ABC', 'DEF')], ['c1', 'c2'])
df.select('*', dbf.hash('c1', df.c2)).show()
Output
+---+---+------------+
| c1| c2|hash(c1, c2)|
+---+---+------------+
|ABC|DEF| 599895104|
+---+---+------------+