polars.Series.str.normalize#

Series.str.normalize(form: UnicodeForm = 'NFC') Series[source]#

Returns the Unicode normal form of the string values.

This uses the forms described in Unicode Standard Annex 15: <https://www.unicode.org/reports/tr15/>.

Parameters:
form{‘NFC’, ‘NFKC’, ‘NFD’, ‘NFKD’}

Unicode form to use.

Examples

>>> s = pl.Series(["01²", "KADOKAWA"])
>>> s.str.normalize("NFC")
shape: (2,)
Series: '' [str]
[
        "01²"
        "KADOKAWA"
]
>>> s.str.normalize("NFKC")
shape: (2,)
Series: '' [str]
[
        "012"
        "KADOKAWA"
]