corr
aggregate function
Applies to: Databricks SQL Databricks Runtime
Returns Pearson coefficient of correlation between a group of number pairs.
Syntax
corr ( [ALL | DISTINCT] expr1, expr2 ) [FILTER ( WHERE cond ) ]
This function can also be invoked as a window function using the OVER
clause.
Arguments
expr1
: An expression that evaluates to a numeric.expr2
: An expression that evaluates to a numeric.cond
: An optional boolean expression filtering the rows used for aggregation.
Returns
A DOUBLE.
If DISTINCT
is specified the function operates only on a unique set of expr1
, expr2
pairs.
Examples
> SELECT corr(c1, c2) FROM VALUES (3, 2), (3, 3), (3, 3), (6, 4) as tab(c1, c2);
0.816496580927726
> SELECT corr(DISTINCT c1, c2) FROM VALUES (3, 2), (3, 3), (3, 3), (6, 4) as tab(c1, c2);
0.8660254037844387
> SELECT corr(DISTINCT c1, c2) FILTER(WHERE c1 != c2)
FROM VALUES (3, 2), (3, 3), (3, 3), (6, 4) as tab(c1, c2);
1.0