Skip to main content

corr

Returns a new Column for the Pearson Correlation Coefficient for col1 and col2.

Syntax

Python
from pyspark.sql import functions as sf

sf.corr(col1, col2)

Parameters

Parameter

Type

Description

col1

pyspark.sql.Column or column name

First column to calculate correlation.

col2

pyspark.sql.Column or column name

Second column to calculate correlation.

Returns

pyspark.sql.Column: Pearson Correlation Coefficient of these two column values.

Examples

Python
from pyspark.sql import functions as sf
a = range(20)
b = [2 * x for x in range(20)]
df = spark.createDataFrame(zip(a, b), ["a", "b"])
df.agg(sf.corr("a", df.b)).show()
Output
+----------+
|corr(a, b)|
+----------+
| 1.0|
+----------+