dense_rank
Window function: returns the rank of rows within a window partition, without any gaps.
The difference between rank and dense_rank is that dense_rank leaves no gaps in ranking sequence when there are ties. That is, if you were ranking a competition using dense_rank and had three people tie for second place, you would say that all three were in second place and that the next person came in third. Rank would give me sequential numbers, making the person that came in third place (after the ties) would register as coming in fifth.
This is equivalent to the DENSE_RANK function in SQL.
Syntax
Python
from pyspark.sql import functions as sf
sf.dense_rank()
Parameters
This function does not take any parameters.
Returns
pyspark.sql.Column: the column for calculating ranks.
Examples
Python
from pyspark.sql import functions as sf
from pyspark.sql import Window
df = spark.createDataFrame([1, 1, 2, 3, 3, 4], "int")
w = Window.orderBy("value")
df.withColumn("drank", sf.dense_rank().over(w)).show()
Output
+-----+-----+
|value|drank|
+-----+-----+
| 1| 1|
| 1| 1|
| 2| 2|
| 3| 3|
| 3| 3|
| 4| 4|
+-----+-----+