Skip to main content

percentile aggregate function

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime

Returns the exact percentile value of expr at the specified percentage in a group.

Syntax

percentile ( [ALL | DISTINCT] expr, percentage [, frequency] ) [FILTER ( WHERE cond ) ]

This function can also be invoked as a window function using the OVER clause.

Arguments

  • expr: An expression that evaluates to a numeric.
  • percentage: A numeric expression between 0 and 1 or an ARRAY of numeric expressions, each between 0 and 1.
  • frequency: An optional integral number literal greater than 0.
  • cond: An optional boolean expression filtering the rows used for aggregation.

Returns

DOUBLE if percentage is numeric, or an ARRAY<DOUBLE> if percentage is an ARRAY.

Frequency describes the number of times expr must be counted. A frequency of 10 for a specific value is equivalent to that value appearing 10 times in the window at a frequency of 1. The default frequency is 1.

If DISTINCT is specified the function operates only on a unique set of expr values.

Examples

SQL
> SELECT percentile(col, 0.3) FROM VALUES (0), (10), (10) AS tab(col);
6.0

> SELECT percentile(DISTINCT col, 0.3) FROM VALUES (0), (10), (10) AS tab(col);
3.0

> SELECT percentile(col, 0.3, freq) FROM VALUES (0, 1), (10, 2) AS tab(col, freq);
6.0

> SELECT percentile(col, array(0.25, 0.75)) FROM VALUES (0), (10) AS tab(col);
[2.5,7.5]