polars.Expr.str.len_chars#
- Expr.str.len_chars() Expr [source]#
Return the length of each string as the number of characters.
- Returns:
- Expr
Expression of data type
UInt32
.
See also
Notes
When working with ASCII text, use
len_bytes()
instead to achieve equivalent output with much better performance:len_bytes()
runs in _O(1)_, whilelen_chars()
runs in (_O(n)_).A character is defined as a Unicode scalar value. A single character is represented by a single byte when working with ASCII text, and a maximum of 4 bytes otherwise.
Examples
>>> df = pl.DataFrame({"a": ["Café", "345", "東京", None]}) >>> df.with_columns( ... pl.col("a").str.len_chars().alias("n_chars"), ... pl.col("a").str.len_bytes().alias("n_bytes"), ... ) shape: (4, 3) ┌──────┬─────────┬─────────┐ │ a ┆ n_chars ┆ n_bytes │ │ --- ┆ --- ┆ --- │ │ str ┆ u32 ┆ u32 │ ╞══════╪═════════╪═════════╡ │ Café ┆ 4 ┆ 5 │ │ 345 ┆ 3 ┆ 3 │ │ 東京 ┆ 2 ┆ 6 │ │ null ┆ null ┆ null │ └──────┴─────────┴─────────┘