Skip to main content

replace (DataFrameNaFunctions)

Returns a new DataFrame replacing a value with another value. DataFrame.replace and DataFrameNaFunctions.replace are aliases of each other. Values for to_replace and value must have the same type and can only be numerics, booleans, or strings. value can be None. When replacing, the new value is cast to the type of the existing column.

Syntax

replace(to_replace, value=None, subset=None)

Parameters

Parameter

Type

Description

to_replace

bool, int, float, str, list, or dict

The value to be replaced. If a dict, then value is ignored and to_replace must be a mapping from a value to its replacement.

value

bool, int, float, str, or None, optional

The replacement value. If a list, must be the same length and type as to_replace. If a scalar and to_replace is a sequence, the scalar is used as the replacement for each item.

subset

list, optional

Column names to consider. Columns in subset that do not have a matching data type are ignored.

Returns

DataFrame

Notes

For numeric replacements, all values to be replaced must have unique floating-point representations. In case of conflicts (for example, {42: -1, 42.0: 1}), an arbitrary replacement is used.

Examples

Python
df = spark.createDataFrame([
(10, 80, "Alice"),
(5, None, "Bob"),
(None, 10, "Tom"),
(None, None, None)],
schema=["age", "height", "name"])

Replace 10 with 20 in all columns.

Python
df.na.replace(10, 20).show()
# +----+------+-----+
# | age|height| name|
# +----+------+-----+
# | 20| 80|Alice|
# | 5| NULL| Bob|
# |NULL| 20| Tom|
# |NULL| NULL| NULL|
# +----+------+-----+

Replace 'Alice' with null in all columns.

Python
df.na.replace('Alice', None).show()
# +----+------+----+
# | age|height|name|
# +----+------+----+
# | 10| 80|NULL|
# | 5| NULL| Bob|
# |NULL| 10| Tom|
# |NULL| NULL|NULL|
# +----+------+----+

Replace 'Alice' with 'A' and 'Bob' with 'B' in the name column.

Python
df.na.replace(['Alice', 'Bob'], ['A', 'B'], 'name').show()
# +----+------+----+
# | age|height|name|
# +----+------+----+
# | 10| 80| A|
# | 5| NULL| B|
# |NULL| 10| Tom|
# |NULL| NULL|NULL|
# +----+------+----+

Replace 10 with 18 in the age column.

Python
df.na.replace(10, 18, 'age').show()
# +----+------+-----+
# | age|height| name|
# +----+------+-----+
# | 18| 80|Alice|
# | 5| NULL| Bob|
# |NULL| 10| Tom|
# |NULL| NULL| NULL|
# +----+------+-----+