Skip to main content

approx_percentile

Returns the approximate percentile of the numeric column col which is the smallest value in the ordered col values (sorted from least to greatest) such that no more than percentage of col values is less than the value or equal to that value.

Syntax

Python
from pyspark.sql import functions as sf

sf.approx_percentile(col, percentage, accuracy=10000)

Parameters

Parameter

Type

Description

col

pyspark.sql.Column or str

Input column.

percentage

pyspark.sql.Column, float, list of floats or tuple of floats

Percentage in decimal (must be between 0.0 and 1.0). When percentage is an array, each value must be between 0.0 and 1.0.

accuracy

pyspark.sql.Column or int

A positive numeric literal which controls approximation accuracy at the cost of memory. Higher value yields better accuracy. 1.0/accuracy is the relative error (default: 10000).

Returns

pyspark.sql.Column: approximate percentile of the numeric column.

Examples

Example 1: Calculate approximate percentiles

Python
from pyspark.sql import functions as sf
key = (sf.col("id") % 3).alias("key")
value = (sf.randn(42) + key * 10).alias("value")
df = spark.range(0, 1000, 1, 1).select(key, value)
df.select(
sf.approx_percentile("value", [0.25, 0.5, 0.75], 1000000)
).show(truncate=False)
Output
+----------------------------------------------------------+
|approx_percentile(value, array(0.25, 0.5, 0.75), 1000000) |
+----------------------------------------------------------+
|[0.7264430125286..., 9.98975299938..., 19.335304783039...]|
+----------------------------------------------------------+

Example 2: Calculate approximate percentile by group

Python
from pyspark.sql import functions as sf
key = (sf.col("id") % 3).alias("key")
value = (sf.randn(42) + key * 10).alias("value")
df = spark.range(0, 1000, 1, 1).select(key, value)
df.groupBy("key").agg(
sf.approx_percentile("value", sf.lit(0.5), sf.lit(1000000))
).sort("key").show()
Output
+---+--------------------------------------+
|key|approx_percentile(value, 0.5, 1000000)|
+---+--------------------------------------+
| 0| -0.03519435193070...|
| 1| 9.990389751837...|
| 2| 19.967859769284...|
+---+--------------------------------------+