polars.Expr.str.replace_all#
- Expr.str.replace_all( ) Expr [source]#
Replace all matching regex/literal substrings with a new string value.
- Parameters:
- pattern
A valid regular expression pattern, compatible with the regex crate.
- value
String that will replace the matched substring.
- literal
Treat
pattern
as a literal string.
See also
Notes
To modify regular expression behaviour (such as case-sensitivity) with flags, use the inline
(?iLmsuxU)
syntax. See the regex crate’s section on grouping and flags for additional information about the use of inline expression modifiers.The dollar sign (
$
) is a special character related to capture groups; if you want to replace some target pattern with characters that include a literal$
you should escape it by doubling it up as$$
, or setliteral=True
if you do not need a full regular expression pattern match. Otherwise, you will be referencing a (potentially non-existent) capture group.In the example below we need to double up
$
to represent a literal dollar sign, otherwise we are referring to a capture group (which may or may not exist):>>> df = pl.DataFrame({"text": ["ab12cd34ef", "gh45ij67kl"]}) >>> df.with_columns( ... # the replacement pattern refers back to the capture group ... text1=pl.col("text").str.replace_all(r"(?<N>\d{2,})", "$N$"), ... # doubling-up the `$` results in it appearing as a literal value ... text2=pl.col("text").str.replace_all(r"(?<N>\d{2,})", "$$N$$"), ... ) shape: (2, 3) ┌────────────┬──────────────┬──────────────┐ │ text ┆ text1 ┆ text2 │ │ --- ┆ --- ┆ --- │ │ str ┆ str ┆ str │ ╞════════════╪══════════════╪══════════════╡ │ ab12cd34ef ┆ ab12$cd34$ef ┆ ab$N$cd$N$ef │ │ gh45ij67kl ┆ gh45$ij67$kl ┆ gh$N$ij$N$kl │ └────────────┴──────────────┴──────────────┘
Examples
>>> df = pl.DataFrame({"id": [1, 2], "text": ["abcabc", "123a123"]}) >>> df.with_columns(pl.col("text").str.replace_all("a", "-")) shape: (2, 2) ┌─────┬─────────┐ │ id ┆ text │ │ --- ┆ --- │ │ i64 ┆ str │ ╞═════╪═════════╡ │ 1 ┆ -bc-bc │ │ 2 ┆ 123-123 │ └─────┴─────────┘
Capture groups are supported. Use
$1
or${1}
in thevalue
string to refer to the first capture group in thepattern
,$2
or${2}
to refer to the second capture group, and so on. You can also use named capture groups.>>> df = pl.DataFrame({"word": ["hat", "hut"]}) >>> df.with_columns( ... positional=pl.col.word.str.replace_all("h(.)t", "b${1}d"), ... named=pl.col.word.str.replace_all("h(?<vowel>.)t", "b${vowel}d"), ... ) shape: (2, 3) ┌──────┬────────────┬───────┐ │ word ┆ positional ┆ named │ │ --- ┆ --- ┆ --- │ │ str ┆ str ┆ str │ ╞══════╪════════════╪═══════╡ │ hat ┆ bad ┆ bad │ │ hut ┆ bud ┆ bud │ └──────┴────────────┴───────┘
Apply case-insensitive string replacement using the
(?i)
flag.>>> df = pl.DataFrame( ... { ... "city": "Philadelphia", ... "season": ["Spring", "Summer", "Autumn", "Winter"], ... "weather": ["Rainy", "Sunny", "Cloudy", "Snowy"], ... } ... ) >>> df.with_columns( ... # apply case-insensitive string replacement ... pl.col("weather").str.replace_all( ... r"(?i)foggy|rainy|cloudy|snowy", "Sunny" ... ) ... ) shape: (4, 3) ┌──────────────┬────────┬─────────┐ │ city ┆ season ┆ weather │ │ --- ┆ --- ┆ --- │ │ str ┆ str ┆ str │ ╞══════════════╪════════╪═════════╡ │ Philadelphia ┆ Spring ┆ Sunny │ │ Philadelphia ┆ Summer ┆ Sunny │ │ Philadelphia ┆ Autumn ┆ Sunny │ │ Philadelphia ┆ Winter ┆ Sunny │ └──────────────┴────────┴─────────┘