first

Returns the first value in a group. The function by default returns the first values it sees. It will return the first non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned. The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.

Syntax

Python
from pyspark.sql import functions as sf

sf.first(col, ignorenulls=False)

Parameters

Parameter	Type	Description
`col`	`pyspark.sql.Column` or column name	Column to fetch first value for.
`ignorenulls`	bool	If first value is null then look for first non-null value. False by default.

Returns

pyspark.sql.Column: first value of the group.

Examples

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame([("Alice", 2), ("Bob", 5), ("Alice", None)], ("name", "age"))
df = df.orderBy(df.age)
df.groupby("name").agg(sf.first("age")).orderBy("name").show()

Output
+-----+----------+
| name|first(age)|
+-----+----------+
|Alice|      NULL|
|  Bob|         5|
+-----+----------+

To ignore any null values, set ignorenulls to True:

Python
df.groupby("name").agg(sf.first("age", ignorenulls=True)).orderBy("name").show()

Output
+-----+----------+
| name|first(age)|
+-----+----------+
|Alice|         2|
|  Bob|         5|
+-----+----------+

Syntax​

Parameters​

Returns​

Examples​

Syntax

Parameters

Returns

Examples