polars.Expr.cat.len_bytes#

Expr.cat.len_bytes() Expr[source]#

Return the byte-length of the string representation of each value.

Returns:
Expr

Expression of data type UInt32.

See also

len_chars

Notes

When working with non-ASCII text, the length in bytes is not the same as the length in characters. You may want to use len_chars() instead. Note that len_bytes() is much more performant (_O(1)_) than len_chars() (_O(n)_).

Examples

>>> df = pl.DataFrame(
...     {"a": pl.Series(["Café", "345", "東京", None], dtype=pl.Categorical)}
... )
>>> df.with_columns(
...     pl.col("a").cat.len_bytes().alias("n_bytes"),
...     pl.col("a").cat.len_chars().alias("n_chars"),
... )
shape: (4, 3)
┌──────┬─────────┬─────────┐
│ a    ┆ n_bytes ┆ n_chars │
│ ---  ┆ ---     ┆ ---     │
│ cat  ┆ u32     ┆ u32     │
╞══════╪═════════╪═════════╡
│ Café ┆ 5       ┆ 4       │
│ 345  ┆ 3       ┆ 3       │
│ 東京 ┆ 6       ┆ 2       │
│ null ┆ null    ┆ null    │
└──────┴─────────┴─────────┘