Skip to main content

array_sort

Collection function: Sorts the input array in ascending order. The elements of the input array must be orderable. Null elements will be placed at the end of the returned array. Supports Spark Connect.

For the corresponding Databricks SQL function, see array_sort function.

Syntax

Python
from pyspark.databricks.sql import functions as dbf

dbf.array_sort(col=<col>, comparator=<comparator>)

Parameters

Parameter

Type

Description

col

pyspark.sql.Column or str

Name of column or expression.

comparator

callable, optional

A binary function that returns a negative integer, 0, or a positive integer as the first element is less than, equal to, or greater than the second element. If the comparator function returns null, the function will fail and raise an error.

Returns

pyspark.sql.Column: sorted array.

Examples

Example 1: Sorting an array in default ascending order

Python
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([([2, 1, None, 3],),([1],),([],)], ['data'])
df.select(dbf.array_sort(df.data).alias('r')).collect()
Output
[Row(r=[1, 2, 3, None]), Row(r=[1]), Row(r=[])]

Example 2: Sorting an array with a custom comparator

Python
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame([(["foo", "foobar", None, "bar"],),(["foo"],),([],)], ['data'])
df.select(dbf.array_sort(
"data",
lambda x, y: dbf.when(x.isNull() | y.isNull(), dbf.lit(0)).otherwise(dbf.length(y) - dbf.length(x))
).alias("r")).collect()
Output
[Row(r=['foobar', 'foo', None, 'bar']), Row(r=['foo']), Row(r=[])]