nth_value

Window function: returns the value that is the offsetth row of the window frame (counting from 1), and null if the size of window frame is less than offset rows.

It will return the offsetth non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.

This is equivalent to the nth_value function in SQL.

Syntax

Python
from pyspark.sql import functions as sf

sf.nth_value(col, offset, ignoreNulls=False)

Parameters

Parameter	Type	Description
`col`	`pyspark.sql.Column` or column name	Name of column or expression.
`offset`	int	Number of row to use as the value.
`ignoreNulls`	bool, optional	Indicates the Nth value should skip null in the determination of which row to use.

Returns

pyspark.sql.Column: value of nth row.

Examples

Example 1: Get the first value in window frame

Python
from pyspark.sql import functions as sf
from pyspark.sql import Window
df = spark.createDataFrame(
    [("a", 1), ("a", 2), ("a", 3), ("b", 8), ("b", 2)], ["c1", "c2"])
df.show()

Output
+---+---+
| c1| c2|
+---+---+
|  a|  1|
|  a|  2|
|  a|  3|
|  b|  8|
|  b|  2|
+---+---+

Python
w = Window.partitionBy("c1").orderBy("c2")
df.withColumn("nth_value", sf.nth_value("c2", 1).over(w)).show()

Output
+---+---+---------+
| c1| c2|nth_value|
+---+---+---------+
|  a|  1|        1|
|  a|  2|        1|
|  a|  3|        1|
|  b|  2|        2|
|  b|  8|        2|
+---+---+---------+

Example 2: Get the second value in window frame

Python
from pyspark.sql import functions as sf
from pyspark.sql import Window
df = spark.createDataFrame(
    [("a", 1), ("a", 2), ("a", 3), ("b", 8), ("b", 2)], ["c1", "c2"])
w = Window.partitionBy("c1").orderBy("c2")
df.withColumn("nth_value", sf.nth_value("c2", 2).over(w)).show()

Output
+---+---+---------+
| c1| c2|nth_value|
+---+---+---------+
|  a|  1|     NULL|
|  a|  2|        2|
|  a|  3|        2|
|  b|  2|     NULL|
|  b|  8|        8|
+---+---+---------+

Syntax​

Parameters​

Returns​

Examples​

Syntax

Parameters

Returns

Examples