Skip to main content

drop (DataFrameNaFunctions)

Returns a new DataFrame omitting rows with null or NaN values. DataFrame.dropna and DataFrameNaFunctions.drop are aliases of each other.

Syntax

drop(how='any', thresh=None, subset=None)

Parameters

Parameter

Type

Description

how

str, optional

Whether to drop a row if it contains any nulls or only if all its values are null. Accepted values are 'any' (default) and 'all'. If thresh is specified, how is ignored.

thresh

int, optional

If specified, drop rows that have fewer than thresh non-null values. Overwrites how.

subset

str, tuple, or list, optional

Column names to consider when checking for null or NaN values.

Returns

DataFrame

Examples

Python
from pyspark.sql import Row
df = spark.createDataFrame([
Row(age=10, height=80.0, name="Alice"),
Row(age=5, height=float("nan"), name="Bob"),
Row(age=None, height=None, name="Tom"),
Row(age=None, height=float("nan"), name=None),
])

Drop the row if it contains any null or NaN value.

Python
df.na.drop().show()
# +---+------+-----+
# |age|height| name|
# +---+------+-----+
# | 10| 80.0|Alice|
# +---+------+-----+

Drop the row only if all its values are null or NaN.

Python
df.na.drop(how='all').show()
# +----+------+-----+
# | age|height| name|
# +----+------+-----+
# | 10| 80.0|Alice|
# | 5| NaN| Bob|
# |NULL| NULL| Tom|
# +----+------+-----+

Drop rows that have fewer than thresh non-null and non-NaN values.

Python
df.na.drop(thresh=2).show()
# +---+------+-----+
# |age|height| name|
# +---+------+-----+
# | 10| 80.0|Alice|
# | 5| NaN| Bob|
# +---+------+-----+

Drop rows with null and NaN values in the specified columns.

Python
df.na.drop(subset=['age', 'name']).show()
# +---+------+-----+
# |age|height| name|
# +---+------+-----+
# | 10| 80.0|Alice|
# | 5| NaN| Bob|
# +---+------+-----+