polars.Expr.replace#

Expr.replace(
mapping: dict[Any, Any],
*,
default: Any = _NoDefault.no_default,
return_dtype: PolarsDataType | None = None,
) Self[source]#

Replace values according to the given mapping.

Needs a global string cache for lazily evaluated queries on columns of type Categorical.

Parameters:
mapping

Mapping of values to their replacement.

default

Value to use when the mapping does not contain the lookup value. Defaults to keeping the original value. Accepts expression input. Non-expression inputs are parsed as literals.

return_dtype

Set return dtype to override automatic return dtype determination.

See also

str.replace

Examples

Replace a single value by another value. Values not in the mapping remain unchanged.

>>> df = pl.DataFrame({"a": [1, 2, 2, 3]})
>>> df.with_columns(pl.col("a").replace({2: 100}).alias("replaced"))
shape: (4, 2)
┌─────┬──────────┐
│ a   ┆ replaced │
│ --- ┆ ---      │
│ i64 ┆ i64      │
╞═════╪══════════╡
│ 1   ┆ 1        │
│ 2   ┆ 100      │
│ 2   ┆ 100      │
│ 3   ┆ 3        │
└─────┴──────────┘

Replace multiple values. Specify a default to set values not in the given map to the default value.

>>> df = pl.DataFrame({"country_code": ["FR", "ES", "DE", None]})
>>> country_code_map = {
...     "CA": "Canada",
...     "DE": "Germany",
...     "FR": "France",
...     None: "unspecified",
... }
>>> df.with_columns(
...     pl.col("country_code")
...     .replace(country_code_map, default=None)
...     .alias("replaced")
... )
shape: (4, 2)
┌──────────────┬─────────────┐
│ country_code ┆ replaced    │
│ ---          ┆ ---         │
│ str          ┆ str         │
╞══════════════╪═════════════╡
│ FR           ┆ France      │
│ ES           ┆ null        │
│ DE           ┆ Germany     │
│ null         ┆ unspecified │
└──────────────┴─────────────┘

The return type can be overridden with the return_dtype argument.

>>> df = df.with_row_count()
>>> df.select(
...     "row_nr",
...     pl.col("row_nr")
...     .replace({1: 10, 2: 20}, default=0, return_dtype=pl.UInt8)
...     .alias("replaced"),
... )
shape: (4, 2)
┌────────┬──────────┐
│ row_nr ┆ replaced │
│ ---    ┆ ---      │
│ u32    ┆ u8       │
╞════════╪══════════╡
│ 0      ┆ 0        │
│ 1      ┆ 10       │
│ 2      ┆ 20       │
│ 3      ┆ 0        │
└────────┴──────────┘

To reference other columns as a default value, a struct column must be constructed first. The first field must be the column in which values are replaced. The other columns can be used in the default expression.

>>> df.with_columns(
...     pl.struct("country_code", "row_nr")
...     .replace(
...         mapping=country_code_map,
...         default=pl.col("row_nr").cast(pl.Utf8),
...     )
...     .alias("replaced")
... )
shape: (4, 3)
┌────────┬──────────────┬─────────────┐
│ row_nr ┆ country_code ┆ replaced    │
│ ---    ┆ ---          ┆ ---         │
│ u32    ┆ str          ┆ str         │
╞════════╪══════════════╪═════════════╡
│ 0      ┆ FR           ┆ France      │
│ 1      ┆ ES           ┆ 1           │
│ 2      ┆ DE           ┆ Germany     │
│ 3      ┆ null         ┆ unspecified │
└────────┴──────────────┴─────────────┘