polars.Expr.str.replace#
- Expr.str.replace( ) Expr [source]#
Replace first matching regex/literal substring with a new string value.
- Parameters:
- pattern
A valid regular expression pattern, compatible with the regex crate.
- value
String that will replace the matched substring.
- literal
Treat
pattern
as a literal string.- n
Number of matches to replace.
See also
Notes
To modify regular expression behaviour (such as case-sensitivity) with flags, use the inline
(?iLmsuxU)
syntax. See the regex crate’s section on grouping and flags for additional information about the use of inline expression modifiers.The dollar sign (
$
) is a special character related to capture groups; if you want to replace some target pattern with characters that include a literal$
you should escape it by doubling it up as$$
, or setliteral=True
if you do not need a full regular expression pattern match. Otherwise, you will be referencing a (potentially non-existent) capture group.In the example below we need to double up
$
(to represent a literal dollar sign, and then refer to the capture group using$n
or${n}
, hence the three consecutive$
characters in the replacement value:>>> df = pl.DataFrame({"cost": ["#12.34", "#56.78"]}) >>> df.with_columns( ... cost_usd=pl.col("cost").str.replace(r"#(\d+)", "$$${1}") ... ) shape: (2, 2) ┌────────┬──────────┐ │ cost ┆ cost_usd │ │ --- ┆ --- │ │ str ┆ str │ ╞════════╪══════════╡ │ #12.34 ┆ $12.34 │ │ #56.78 ┆ $56.78 │ └────────┴──────────┘
Examples
>>> df = pl.DataFrame({"id": [1, 2], "text": ["123abc", "abc456"]}) >>> df.with_columns(pl.col("text").str.replace(r"abc\b", "ABC")) shape: (2, 2) ┌─────┬────────┐ │ id ┆ text │ │ --- ┆ --- │ │ i64 ┆ str │ ╞═════╪════════╡ │ 1 ┆ 123ABC │ │ 2 ┆ abc456 │ └─────┴────────┘
Capture groups are supported. Use
$1
or${1}
in thevalue
string to refer to the first capture group in thepattern
,$2
or${2}
to refer to the second capture group, and so on. You can also use named capture groups.>>> df = pl.DataFrame({"word": ["hat", "hut"]}) >>> df.with_columns( ... positional=pl.col.word.str.replace("h(.)t", "b${1}d"), ... named=pl.col.word.str.replace("h(?<vowel>.)t", "b${vowel}d"), ... ) shape: (2, 3) ┌──────┬────────────┬───────┐ │ word ┆ positional ┆ named │ │ --- ┆ --- ┆ --- │ │ str ┆ str ┆ str │ ╞══════╪════════════╪═══════╡ │ hat ┆ bad ┆ bad │ │ hut ┆ bud ┆ bud │ └──────┴────────────┴───────┘
Apply case-insensitive string replacement using the
(?i)
flag.>>> df = pl.DataFrame( ... { ... "city": "Philadelphia", ... "season": ["Spring", "Summer", "Autumn", "Winter"], ... "weather": ["Rainy", "Sunny", "Cloudy", "Snowy"], ... } ... ) >>> df.with_columns( ... pl.col("weather").str.replace(r"(?i)foggy|rainy|cloudy|snowy", "Sunny") ... ) shape: (4, 3) ┌──────────────┬────────┬─────────┐ │ city ┆ season ┆ weather │ │ --- ┆ --- ┆ --- │ │ str ┆ str ┆ str │ ╞══════════════╪════════╪═════════╡ │ Philadelphia ┆ Spring ┆ Sunny │ │ Philadelphia ┆ Summer ┆ Sunny │ │ Philadelphia ┆ Autumn ┆ Sunny │ │ Philadelphia ┆ Winter ┆ Sunny │ └──────────────┴────────┴─────────┘