replace (DataFrameNaFunctions)
Returns a new DataFrame replacing a value with another value. DataFrame.replace and DataFrameNaFunctions.replace are aliases of each other. Values for to_replace and value must have the same type and can only be numerics, booleans, or strings. value can be None. When replacing, the new value is cast to the type of the existing column.
Syntax
replace(to_replace, value=None, subset=None)
Parameters
Parameter | Type | Description |
|---|---|---|
| bool, int, float, str, list, or dict | The value to be replaced. If a dict, then |
| bool, int, float, str, or None, optional | The replacement value. If a list, must be the same length and type as |
| list, optional | Column names to consider. Columns in |
Returns
DataFrame
Notes
For numeric replacements, all values to be replaced must have unique floating-point representations. In case of conflicts (for example, {42: -1, 42.0: 1}), an arbitrary replacement is used.
Examples
df = spark.createDataFrame([
(10, 80, "Alice"),
(5, None, "Bob"),
(None, 10, "Tom"),
(None, None, None)],
schema=["age", "height", "name"])
Replace 10 with 20 in all columns.
df.na.replace(10, 20).show()
# +----+------+-----+
# | age|height| name|
# +----+------+-----+
# | 20| 80|Alice|
# | 5| NULL| Bob|
# |NULL| 20| Tom|
# |NULL| NULL| NULL|
# +----+------+-----+
Replace 'Alice' with null in all columns.
df.na.replace('Alice', None).show()
# +----+------+----+
# | age|height|name|
# +----+------+----+
# | 10| 80|NULL|
# | 5| NULL| Bob|
# |NULL| 10| Tom|
# |NULL| NULL|NULL|
# +----+------+----+
Replace 'Alice' with 'A' and 'Bob' with 'B' in the name column.
df.na.replace(['Alice', 'Bob'], ['A', 'B'], 'name').show()
# +----+------+----+
# | age|height|name|
# +----+------+----+
# | 10| 80| A|
# | 5| NULL| B|
# |NULL| 10| Tom|
# |NULL| NULL|NULL|
# +----+------+----+
Replace 10 with 18 in the age column.
df.na.replace(10, 18, 'age').show()
# +----+------+-----+
# | age|height| name|
# +----+------+-----+
# | 18| 80|Alice|
# | 5| NULL| Bob|
# |NULL| 10| Tom|
# |NULL| NULL| NULL|
# +----+------+-----+