polars.Expr.cat.len_bytes#
- Expr.cat.len_bytes() Expr [source]#
Return the byte-length of the string representation of each value.
- Returns:
- Expr
Expression of data type
UInt32
.
See also
Notes
When working with non-ASCII text, the length in bytes is not the same as the length in characters. You may want to use
len_chars()
instead. Note thatlen_bytes()
is much more performant (_O(1)_) thanlen_chars()
(_O(n)_).Examples
>>> df = pl.DataFrame( ... {"a": pl.Series(["Café", "345", "東京", None], dtype=pl.Categorical)} ... ) >>> df.with_columns( ... pl.col("a").cat.len_bytes().alias("n_bytes"), ... pl.col("a").cat.len_chars().alias("n_chars"), ... ) shape: (4, 3) ┌──────┬─────────┬─────────┐ │ a ┆ n_bytes ┆ n_chars │ │ --- ┆ --- ┆ --- │ │ cat ┆ u32 ┆ u32 │ ╞══════╪═════════╪═════════╡ │ Café ┆ 5 ┆ 4 │ │ 345 ┆ 3 ┆ 3 │ │ 東京 ┆ 6 ┆ 2 │ │ null ┆ null ┆ null │ └──────┴─────────┴─────────┘