polars.Expr.str.slice#

Expr.str.slice(
offset: int | IntoExprColumn,
length: int | IntoExprColumn | None = None,
) Expr[source]#

Extract a substring from each string value.

Parameters:
offset

Start index. Negative indexing is supported.

length

Length of the slice. If set to None (default), the slice is taken to the end of the string.

Returns:
Expr

Expression of data type String.

Notes

Both the offset and length inputs are defined in terms of the number of characters in the (UTF8) string. A character is defined as a Unicode scalar value. A single character is represented by a single byte when working with ASCII text, and a maximum of 4 bytes otherwise.

Examples

>>> df = pl.DataFrame(
...     {
...         "s": ["pear", None, "papaya", "dragonfruit"],
...         "idx": [1, 3, 5, 7],
...     }
... )
>>> df.select("s", substr=pl.col("s").str.slice(-3))
shape: (4, 2)
┌─────────────┬────────┐
│ s           ┆ substr │
│ ---         ┆ ---    │
│ str         ┆ str    │
╞═════════════╪════════╡
│ pear        ┆ ear    │
│ null        ┆ null   │
│ papaya      ┆ aya    │
│ dragonfruit ┆ uit    │
└─────────────┴────────┘

Using the optional length parameter and passing offset as an expression:

>>> df.with_columns(substr=pl.col("s").str.slice("idx", length=3))
shape: (4, 3)
┌─────────────┬─────┬────────┐
│ s           ┆ idx ┆ substr │
│ ---         ┆ --- ┆ ---    │
│ str         ┆ i64 ┆ str    │
╞═════════════╪═════╪════════╡
│ pear        ┆ 1   ┆ ear    │
│ null        ┆ 3   ┆ null   │
│ papaya      ┆ 5   ┆ a      │
│ dragonfruit ┆ 7   ┆ rui    │
└─────────────┴─────┴────────┘