polars.Expr.name.replace#

Expr.name.replace(pattern: str, value: str, *, literal: bool = False) Expr[source]#

Replace matching regex/literal substring in the name with a new value.

Parameters:
pattern

A valid regular expression pattern, compatible with the regex crate.

value

String that will replace the matched substring.

literal

Treat pattern as a literal string, not a regex.

See also

Expr.str.replace

Notes

  • To modify regular expression behaviour (such as case-sensitivity) with flags, use the inline (?iLmsuxU) syntax. See the regex crate’s section on grouping and flags for additional information about the use of inline expression modifiers.

  • The dollar sign ($) is a special character related to capture groups; if you want to replace some target pattern with characters that include a literal $ you should escape it by doubling it up as $$, or set literal=True if you do not need a full regular expression pattern match. Otherwise, you will be referencing a (potentially non-existent) capture group.

Examples

>>> df = pl.DataFrame(
...     {
...         "n_foo": [1, 2, 3],
...         "n_bar": ["x", "y", "z"],
...     }
... )
>>> df.select(pl.all().name.replace(r"^n_", "col_"))
shape: (3, 2)
┌─────────┬─────────┐
│ col_foo ┆ col_bar │
│ ---     ┆ ---     │
│ i64     ┆ str     │
╞═════════╪═════════╡
│ 1       ┆ x       │
│ 2       ┆ y       │
│ 3       ┆ z       │
└─────────┴─────────┘
>>> df.select(pl.all().name.replace(r"(a|e|i|o|u)", "@")).schema
Schema({'n_f@@': Int64, 'n_b@r': String})

Apply case-insensitive string replacement using the (?i) flag.

>>> pl.DataFrame({"Foo": [1], "faz": [2]}).select(
...     pl.all().name.replace(r"(?i)^f", "b")
... )
shape: (1, 2)
┌─────┬─────┐
│ boo ┆ baz │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 2   │
└─────┴─────┘

Capture groups are supported. Use $1 or ${1} in the value string to refer to the first capture group in the pattern, $2 or ${2} to refer to the second capture group, and so on. You can also use named capture groups.

>>> df = pl.DataFrame({"x_1": [1], "x_2": [2], "group_id": ["xyz"]})
>>> df.select(pl.all().name.replace(r"_(\d+)$", ":$1"))
shape: (1, 3)
┌─────┬─────┬──────────┐
│ x:1 ┆ x:2 ┆ group_id │
│ --- ┆ --- ┆ ---      │
│ i64 ┆ i64 ┆ str      │
╞═════╪═════╪══════════╡
│ 1   ┆ 2   ┆ xyz      │
└─────┴─────┴──────────┘

The ${1} form is used to disambiguate the group reference from surrounding text.

>>> df = pl.DataFrame({"hat": [1], "hut": [2]}).with_row_index()
>>> df.with_columns(pl.all().name.replace(r"^h(.)t", "s$1m"))  
# ComputeError: the name 's' passed to `LazyFrame.with_columns` is duplicate
>>> df.with_columns(pl.all().name.replace(r"^h(.)t", "s${1}m"))
shape: (1, 5)
┌───────┬─────┬─────┬─────┬─────┐
│ index ┆ hat ┆ hut ┆ sam ┆ sum │
│ ---   ┆ --- ┆ --- ┆ --- ┆ --- │
│ u32   ┆ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═══════╪═════╪═════╪═════╪═════╡
│ 0     ┆ 1   ┆ 2   ┆ 1   ┆ 2   │
└───────┴─────┴─────┴─────┴─────┘