polars.Expr.str.normalize#
- Expr.str.normalize(form: UnicodeForm = 'NFC') Expr [source]#
Returns the Unicode normal form of the string values.
This uses the forms described in Unicode Standard Annex 15: <https://www.unicode.org/reports/tr15/>.
- Parameters:
- form{‘NFC’, ‘NFKC’, ‘NFD’, ‘NFKD’}
Unicode form to use.
Examples
>>> df = pl.DataFrame({"text": ["01²", "KADOKAWA"]}) >>> new = df.with_columns( ... nfc=pl.col("text").str.normalize("NFC"), ... nfkc=pl.col("text").str.normalize("NFKC"), ... ) >>> new shape: (2, 3) ┌──────────────────┬──────────────────┬──────────┐ │ text ┆ nfc ┆ nfkc │ │ --- ┆ --- ┆ --- │ │ str ┆ str ┆ str │ ╞══════════════════╪══════════════════╪══════════╡ │ 01² ┆ 01² ┆ 012 │ │ KADOKAWA ┆ KADOKAWA ┆ KADOKAWA │ └──────────────────┴──────────────────┴──────────┘ >>> new.select(pl.all().str.len_bytes()) shape: (2, 3) ┌──────┬─────┬──────┐ │ text ┆ nfc ┆ nfkc │ │ --- ┆ --- ┆ --- │ │ u32 ┆ u32 ┆ u32 │ ╞══════╪═════╪══════╡ │ 4 ┆ 4 ┆ 3 │ │ 24 ┆ 24 ┆ 8 │ └──────┴─────┴──────┘