corr aggregate function
Applies to: Databricks SQL
Databricks Runtime
Returns Pearson coefficient of correlation between a group of number pairs.
Syntax
corr ( [ALL | DISTINCT] expr1, expr2 ) [FILTER ( WHERE cond ) ]
This function can also be invoked as a window function using the OVER clause.
Arguments
expr1: An expression that evaluates to a numeric.expr2: An expression that evaluates to a numeric.cond: An optional boolean expression filtering the rows used for aggregation.
Returns
A DOUBLE.
If DISTINCT is specified the function operates only on a unique set of expr1, expr2 pairs.
Examples
SQL
> SELECT corr(c1, c2) FROM VALUES (3, 2), (3, 3), (3, 3), (6, 4) as tab(c1, c2);
0.816496580927726
> SELECT corr(DISTINCT c1, c2) FROM VALUES (3, 2), (3, 3), (3, 3), (6, 4) as tab(c1, c2);
0.8660254037844387
> SELECT corr(DISTINCT c1, c2) FILTER(WHERE c1 != c2)
FROM VALUES (3, 2), (3, 3), (3, 3), (6, 4) as tab(c1, c2);
1.0