polars.Series.str.find_many#

Series.str.find_many( patterns: IntoExpr, *, ascii_case_insensitive: bool = False, overlapping: bool = False, leftmost: bool = False, ) → Series[source]#

Use the Aho-Corasick algorithm to find all matches.

The function returns the byte offset of the start of each match. The return type will be List<UInt32>

Parameters:

patterns: String patterns to search.
ascii_case_insensitive: Enable ASCII-aware case-insensitive matching. When this option is enabled, searching will be performed without respect to case for ASCII letters (a-z and A-Z) only.
overlapping: Whether matches may overlap.
leftmost: Guarantees in case there are overlapping matches that the leftmost match is used. In case there are multiple candidates for the leftmost match the pattern which comes first in patterns is used. May not be used together with overlapping = True.

Notes

This method supports matching on string literals only, and does not support regular expression matching.

Examples

>>> _ = pl.Config.set_fmt_str_lengths(100)
>>> df = pl.DataFrame({"values": ["discontent"]})
>>> patterns = ["winter", "disco", "onte", "discontent"]
>>> df.with_columns(
...     pl.col("values")
...     .str.extract_many(patterns, overlapping=False)
...     .alias("matches"),
...     pl.col("values")
...     .str.extract_many(patterns, overlapping=True)
...     .alias("matches_overlapping"),
... )
shape: (1, 3)
┌────────────┬───────────┬─────────────────────────────────┐
│ values     ┆ matches   ┆ matches_overlapping             │
│ ---        ┆ ---       ┆ ---                             │
│ str        ┆ list[str] ┆ list[str]                       │
╞════════════╪═══════════╪═════════════════════════════════╡
│ discontent ┆ ["disco"] ┆ ["disco", "onte", "discontent"] │
└────────────┴───────────┴─────────────────────────────────┘
>>> df = pl.DataFrame(
...     {
...         "values": ["discontent", "rhapsody"],
...         "patterns": [
...             ["winter", "disco", "onte", "discontent"],
...             ["rhap", "ody", "coalesce"],
...         ],
...     }
... )
>>> df.select(pl.col("values").str.find_many("patterns"))
shape: (2, 1)
┌───────────┐
│ values    │
│ ---       │
│ list[u32] │
╞═══════════╡
│ [0]       │
│ [0, 5]    │
└───────────┘