polars.Expr.str.find#
- Expr.str.find( ) Expr [source]#
Return the index position of the first substring matching a pattern.
If the pattern is not found, returns None.
- Parameters:
- pattern
A valid regular expression pattern, compatible with the regex crate.
- literal
Treat
pattern
as a literal string, not as a regular expression.- strict
Raise an error if the underlying pattern is not a valid regex, otherwise mask out with a null value.
See also
contains
Check if the string contains a substring that matches a pattern.
Notes
To modify regular expression behaviour (such as case-sensitivity) with flags, use the inline
(?iLmsuxU)
syntax. For example:>>> pl.DataFrame({"s": ["AAA", "aAa", "aaa"]}).with_columns( ... default_match=pl.col("s").str.find("Aa"), ... insensitive_match=pl.col("s").str.find("(?i)Aa"), ... ) shape: (3, 3) ┌─────┬───────────────┬───────────────────┐ │ s ┆ default_match ┆ insensitive_match │ │ --- ┆ --- ┆ --- │ │ str ┆ u32 ┆ u32 │ ╞═════╪═══════════════╪═══════════════════╡ │ AAA ┆ null ┆ 0 │ │ aAa ┆ 1 ┆ 0 │ │ aaa ┆ null ┆ 0 │ └─────┴───────────────┴───────────────────┘
See the regex crate’s section on grouping and flags for additional information about the use of inline expression modifiers.
Examples
>>> df = pl.DataFrame( ... { ... "txt": ["Crab", "Lobster", None, "Crustacean"], ... "pat": ["a[bc]", "b.t", "[aeiuo]", "(?i)A[BC]"], ... } ... )
Find the index of the first substring matching a regex or literal pattern:
>>> df.select( ... pl.col("txt"), ... pl.col("txt").str.find("a|e").alias("a|e (regex)"), ... pl.col("txt").str.find("e", literal=True).alias("e (lit)"), ... ) shape: (4, 3) ┌────────────┬─────────────┬─────────┐ │ txt ┆ a|e (regex) ┆ e (lit) │ │ --- ┆ --- ┆ --- │ │ str ┆ u32 ┆ u32 │ ╞════════════╪═════════════╪═════════╡ │ Crab ┆ 2 ┆ null │ │ Lobster ┆ 5 ┆ 5 │ │ null ┆ null ┆ null │ │ Crustacean ┆ 5 ┆ 7 │ └────────────┴─────────────┴─────────┘
Match against a pattern found in another column or (expression):
>>> df.with_columns(pl.col("txt").str.find(pl.col("pat")).alias("find_pat")) shape: (4, 3) ┌────────────┬───────────┬──────────┐ │ txt ┆ pat ┆ find_pat │ │ --- ┆ --- ┆ --- │ │ str ┆ str ┆ u32 │ ╞════════════╪═══════════╪══════════╡ │ Crab ┆ a[bc] ┆ 2 │ │ Lobster ┆ b.t ┆ 2 │ │ null ┆ [aeiuo] ┆ null │ │ Crustacean ┆ (?i)A[BC] ┆ 5 │ └────────────┴───────────┴──────────┘