polars.Expr.str.find#

Expr.str.find(
pattern: str | Expr,
*,
literal: bool = False,
strict: bool = True,
) Expr[source]#

Return the index position of the first substring matching a pattern.

If the pattern is not found, returns None.

Parameters:
pattern

A valid regular expression pattern, compatible with the regex crate.

literal

Treat pattern as a literal string, not as a regular expression.

strict

Raise an error if the underlying pattern is not a valid regex, otherwise mask out with a null value.

See also

contains

Check if the string contains a substring that matches a pattern.

Notes

To modify regular expression behaviour (such as case-sensitivity) with flags, use the inline (?iLmsuxU) syntax. For example:

>>> pl.DataFrame({"s": ["AAA", "aAa", "aaa"]}).with_columns(
...     default_match=pl.col("s").str.find("Aa"),
...     insensitive_match=pl.col("s").str.find("(?i)Aa"),
... )
shape: (3, 3)
┌─────┬───────────────┬───────────────────┐
│ s   ┆ default_match ┆ insensitive_match │
│ --- ┆ ---           ┆ ---               │
│ str ┆ u32           ┆ u32               │
╞═════╪═══════════════╪═══════════════════╡
│ AAA ┆ null          ┆ 0                 │
│ aAa ┆ 1             ┆ 0                 │
│ aaa ┆ null          ┆ 0                 │
└─────┴───────────────┴───────────────────┘

See the regex crate’s section on grouping and flags for additional information about the use of inline expression modifiers.

Examples

>>> df = pl.DataFrame(
...     {
...         "txt": ["Crab", "Lobster", None, "Crustacean"],
...         "pat": ["a[bc]", "b.t", "[aeiuo]", "(?i)A[BC]"],
...     }
... )

Find the index of the first substring matching a regex or literal pattern:

>>> df.select(
...     pl.col("txt"),
...     pl.col("txt").str.find("a|e").alias("a|e (regex)"),
...     pl.col("txt").str.find("e", literal=True).alias("e (lit)"),
... )
shape: (4, 3)
┌────────────┬─────────────┬─────────┐
│ txt        ┆ a|e (regex) ┆ e (lit) │
│ ---        ┆ ---         ┆ ---     │
│ str        ┆ u32         ┆ u32     │
╞════════════╪═════════════╪═════════╡
│ Crab       ┆ 2           ┆ null    │
│ Lobster    ┆ 5           ┆ 5       │
│ null       ┆ null        ┆ null    │
│ Crustacean ┆ 5           ┆ 7       │
└────────────┴─────────────┴─────────┘

Match against a pattern found in another column or (expression):

>>> df.with_columns(pl.col("txt").str.find(pl.col("pat")).alias("find_pat"))
shape: (4, 3)
┌────────────┬───────────┬──────────┐
│ txt        ┆ pat       ┆ find_pat │
│ ---        ┆ ---       ┆ ---      │
│ str        ┆ str       ┆ u32      │
╞════════════╪═══════════╪══════════╡
│ Crab       ┆ a[bc]     ┆ 2        │
│ Lobster    ┆ b.t       ┆ 2        │
│ null       ┆ [aeiuo]   ┆ null     │
│ Crustacean ┆ (?i)A[BC] ┆ 5        │
└────────────┴───────────┴──────────┘