hash

Calculates the hash code of given columns, and returns the result as an int column. Supports Spark Connect.

For the corresponding Databricks SQL function, see hash function.

Syntax

Python
from pyspark.databricks.sql import functions as dbf

dbf.hash(*cols)

Parameters

Parameter	Type	Description
`cols`	`pyspark.sql.Column` or `str`	One or more columns to compute on.

Returns

pyspark.sql.Column: hash value as int column.

Examples

Example 1: Computing hash of a single column

Python
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([('ABC', 'DEF')], ['c1', 'c2'])
df.select('*', dbf.hash('c1')).show()

Output
+---+---+----------+
| c1| c2|  hash(c1)|
+---+---+----------+
|ABC|DEF|-757602832|
+---+---+----------+

Example 2: Computing hash of multiple columns

Python
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([('ABC', 'DEF')], ['c1', 'c2'])
df.select('*', dbf.hash('c1', df.c2)).show()

Output
+---+---+------------+
| c1| c2|hash(c1, c2)|
+---+---+------------+
|ABC|DEF|   599895104|
+---+---+------------+

Syntax​

Parameters​

Returns​

Examples​

Syntax

Parameters

Returns

Examples