percentile aggregate function

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime

Returns the exact percentile value of expr at the specified percentage in a group.

Syntax

percentile ( [ALL | DISTINCT] expr, percentage [, frequency] ) [FILTER ( WHERE cond ) ]

This function can also be invoked as a window function using the OVER clause.

Arguments

  • expr: An expression that evaluates to a numeric.

  • percentage: A numeric expression between 0 and 1 or an ARRAY of numeric expressions, each between 0 and 1.

  • frequency: An optional integral number literal greater than 0.

  • cond: An optional boolean expression filtering the rows used for aggregation.

Returns

DOUBLE if percentage is numeric, or an ARRAY of DOUBLE if percentage is an ARRAY.

Frequency describes the number of times expr must be counted. A frequency of 10 for a specific value is equivalent to that value appearing 10 times in the window at a frequency of 1. The default frequency is 1.

If DISTINCT is specified the function operates only on a unique set of expr values.

Examples

> SELECT percentile(col, 0.3) FROM VALUES (0), (10), (10) AS tab(col);
 6.0
> SELECT percentile(DISTINCT col, 0.3) FROM VALUES (0), (10), (10) AS tab(col);
 3.0
> SELECT percentile(col, 0.3, freq) FROM VALUES (0, 1), (10, 2) AS tab(col, freq);
 6.0
> SELECT percentile(col, array(0.25, 0.75)) FROM VALUES (0), (10) AS tab(col);
 [2.5,7.5]