polars.Expr.str.replace_all#

Expr.str.replace_all(
pattern: str | Expr,
value: str | Expr,
*,
literal: bool = False,
) Expr[source]#

Replace all matching regex/literal substrings with a new string value.

Parameters:
pattern

A valid regular expression pattern, compatible with the regex crate.

value

String that will replace the matched substring.

literal

Treat pattern as a literal string.

See also

replace

Notes

  • To modify regular expression behaviour (such as case-sensitivity) with flags, use the inline (?iLmsuxU) syntax. See the regex crate’s section on grouping and flags for additional information about the use of inline expression modifiers.

  • The dollar sign ($) is a special character related to capture groups; if you want to replace some target pattern with characters that include a literal $ you should escape it by doubling it up as $$, or set literal=True if you do not need a full regular expression pattern match. Otherwise, you will be referencing a (potentially non-existent) capture group.

    In the example below we need to double up $ to represent a literal dollar sign, otherwise we are referring to a capture group (which may or may not exist):

    >>> df = pl.DataFrame({"text": ["ab12cd34ef", "gh45ij67kl"]})
    >>> df.with_columns(
    ...     # the replacement pattern refers back to the capture group
    ...     text1=pl.col("text").str.replace_all(r"(?<N>\d{2,})", "$N$"),
    ...     # doubling-up the `$` results in it appearing as a literal value
    ...     text2=pl.col("text").str.replace_all(r"(?<N>\d{2,})", "$$N$$"),
    ... )
    shape: (2, 3)
    ┌────────────┬──────────────┬──────────────┐
    │ text       ┆ text1        ┆ text2        │
    │ ---        ┆ ---          ┆ ---          │
    │ str        ┆ str          ┆ str          │
    ╞════════════╪══════════════╪══════════════╡
    │ ab12cd34ef ┆ ab12$cd34$ef ┆ ab$N$cd$N$ef │
    │ gh45ij67kl ┆ gh45$ij67$kl ┆ gh$N$ij$N$kl │
    └────────────┴──────────────┴──────────────┘
    

Examples

>>> df = pl.DataFrame({"id": [1, 2], "text": ["abcabc", "123a123"]})
>>> df.with_columns(pl.col("text").str.replace_all("a", "-"))
shape: (2, 2)
┌─────┬─────────┐
│ id  ┆ text    │
│ --- ┆ ---     │
│ i64 ┆ str     │
╞═════╪═════════╡
│ 1   ┆ -bc-bc  │
│ 2   ┆ 123-123 │
└─────┴─────────┘

Capture groups are supported. Use $1 or ${1} in the value string to refer to the first capture group in the pattern, $2 or ${2} to refer to the second capture group, and so on. You can also use named capture groups.

>>> df = pl.DataFrame({"word": ["hat", "hut"]})
>>> df.with_columns(
...     positional=pl.col.word.str.replace_all("h(.)t", "b${1}d"),
...     named=pl.col.word.str.replace_all("h(?<vowel>.)t", "b${vowel}d"),
... )
shape: (2, 3)
┌──────┬────────────┬───────┐
│ word ┆ positional ┆ named │
│ ---  ┆ ---        ┆ ---   │
│ str  ┆ str        ┆ str   │
╞══════╪════════════╪═══════╡
│ hat  ┆ bad        ┆ bad   │
│ hut  ┆ bud        ┆ bud   │
└──────┴────────────┴───────┘

Apply case-insensitive string replacement using the (?i) flag.

>>> df = pl.DataFrame(
...     {
...         "city": "Philadelphia",
...         "season": ["Spring", "Summer", "Autumn", "Winter"],
...         "weather": ["Rainy", "Sunny", "Cloudy", "Snowy"],
...     }
... )
>>> df.with_columns(
...     # apply case-insensitive string replacement
...     pl.col("weather").str.replace_all(
...         r"(?i)foggy|rainy|cloudy|snowy", "Sunny"
...     )
... )
shape: (4, 3)
┌──────────────┬────────┬─────────┐
│ city         ┆ season ┆ weather │
│ ---          ┆ ---    ┆ ---     │
│ str          ┆ str    ┆ str     │
╞══════════════╪════════╪═════════╡
│ Philadelphia ┆ Spring ┆ Sunny   │
│ Philadelphia ┆ Summer ┆ Sunny   │
│ Philadelphia ┆ Autumn ┆ Sunny   │
│ Philadelphia ┆ Winter ┆ Sunny   │
└──────────────┴────────┴─────────┘