Skip to main content

sort

Returns a new DataFrame sorted by the specified column(s).

Syntax

sort(*cols: Union[int, str, Column, List[Union[int, str, Column]]], **kwargs: Any)

Parameters

Parameter

Type

Description

cols

int, str, list, or Column, optional

list of Column or column names or column ordinals to sort by.

ascending

bool or list, optional, default True

boolean or list of boolean. Sort ascending vs. descending. Specify list for multiple sort orders. If a list is specified, the length of the list must equal the length of the cols.

Returns

DataFrame: Sorted DataFrame.

Notes

A column ordinal starts from 1, which is different from the 0-based __getitem__. If a column ordinal is negative, it means sort descending.

Examples

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame([
(2, "Alice"), (5, "Bob")], schema=["age", "name"])

df.sort(sf.asc("age")).show()
# +---+-----+
# |age| name|
# +---+-----+
# | 2|Alice|
# | 5| Bob|
# +---+-----+

df.sort(df.age.desc()).show()
# +---+-----+
# |age| name|
# +---+-----+
# | 5| Bob|
# | 2|Alice|
# +---+-----+

df.sort("age", ascending=False).show()
# +---+-----+
# |age| name|
# +---+-----+
# | 5| Bob|
# | 2|Alice|
# +---+-----+

df = spark.createDataFrame([
(2, "Alice"), (2, "Bob"), (5, "Bob")], schema=["age", "name"])
df.orderBy(sf.desc("age"), "name").show()
# +---+-----+
# |age| name|
# +---+-----+
# | 5| Bob|
# | 2|Alice|
# | 2| Bob|
# +---+-----+