polars.Expr.cat.len_chars#

Expr.cat.len_chars() Expr[source]#

Return the number of characters of the string representation of each value.

Returns:
Expr

Expression of data type UInt32.

See also

len_bytes

Notes

When working with ASCII text, use len_bytes() instead to achieve equivalent output with much better performance: len_bytes() runs in _O(1)_, while len_chars() runs in (_O(n)_).

A character is defined as a Unicode scalar value. A single character is represented by a single byte when working with ASCII text, and a maximum of 4 bytes otherwise.

Examples

>>> df = pl.DataFrame(
...     {"a": pl.Series(["Café", "345", "東京", None], dtype=pl.Categorical)}
... )
>>> df.with_columns(
...     pl.col("a").cat.len_chars().alias("n_chars"),
...     pl.col("a").cat.len_bytes().alias("n_bytes"),
... )
shape: (4, 3)
┌──────┬─────────┬─────────┐
│ a    ┆ n_chars ┆ n_bytes │
│ ---  ┆ ---     ┆ ---     │
│ cat  ┆ u32     ┆ u32     │
╞══════╪═════════╪═════════╡
│ Café ┆ 4       ┆ 5       │
│ 345  ┆ 3       ┆ 3       │
│ 東京 ┆ 2       ┆ 6       │
│ null ┆ null    ┆ null    │
└──────┴─────────┴─────────┘