polars.Expr.cat.len_chars#
- Expr.cat.len_chars() Expr [source]#
Return the number of characters of the string representation of each value.
- Returns:
- Expr
Expression of data type
UInt32
.
See also
Notes
When working with ASCII text, use
len_bytes()
instead to achieve equivalent output with much better performance:len_bytes()
runs in _O(1)_, whilelen_chars()
runs in (_O(n)_).A character is defined as a Unicode scalar value. A single character is represented by a single byte when working with ASCII text, and a maximum of 4 bytes otherwise.
Examples
>>> df = pl.DataFrame( ... {"a": pl.Series(["Café", "345", "東京", None], dtype=pl.Categorical)} ... ) >>> df.with_columns( ... pl.col("a").cat.len_chars().alias("n_chars"), ... pl.col("a").cat.len_bytes().alias("n_bytes"), ... ) shape: (4, 3) ┌──────┬─────────┬─────────┐ │ a ┆ n_chars ┆ n_bytes │ │ --- ┆ --- ┆ --- │ │ cat ┆ u32 ┆ u32 │ ╞══════╪═════════╪═════════╡ │ Café ┆ 4 ┆ 5 │ │ 345 ┆ 3 ┆ 3 │ │ 東京 ┆ 2 ┆ 6 │ │ null ┆ null ┆ null │ └──────┴─────────┴─────────┘