polars.Series.str.find_many#

Series.str.find_many(
patterns: IntoExpr,
*,
ascii_case_insensitive: bool = False,
overlapping: bool = False,
leftmost: bool = False,
) Series[source]#

Use the Aho-Corasick algorithm to find all matches.

The function returns the byte offset of the start of each match. The return type will be List<UInt32>

Parameters:
patterns

String patterns to search.

ascii_case_insensitive

Enable ASCII-aware case-insensitive matching. When this option is enabled, searching will be performed without respect to case for ASCII letters (a-z and A-Z) only.

overlapping

Whether matches may overlap.

leftmost

Guarantees in case there are overlapping matches that the leftmost match is used. In case there are multiple candidates for the leftmost match the pattern which comes first in patterns is used. May not be used together with overlapping = True.

Notes

This method supports matching on string literals only, and does not support regular expression matching.

Examples

>>> _ = pl.Config.set_fmt_str_lengths(100)
>>> df = pl.DataFrame({"values": ["discontent"]})
>>> patterns = ["winter", "disco", "onte", "discontent"]
>>> df.with_columns(
...     pl.col("values")
...     .str.extract_many(patterns, overlapping=False)
...     .alias("matches"),
...     pl.col("values")
...     .str.extract_many(patterns, overlapping=True)
...     .alias("matches_overlapping"),
... )
shape: (1, 3)
┌────────────┬───────────┬─────────────────────────────────┐
│ values     ┆ matches   ┆ matches_overlapping             │
│ ---        ┆ ---       ┆ ---                             │
│ str        ┆ list[str] ┆ list[str]                       │
╞════════════╪═══════════╪═════════════════════════════════╡
│ discontent ┆ ["disco"] ┆ ["disco", "onte", "discontent"] │
└────────────┴───────────┴─────────────────────────────────┘
>>> df = pl.DataFrame(
...     {
...         "values": ["discontent", "rhapsody"],
...         "patterns": [
...             ["winter", "disco", "onte", "discontent"],
...             ["rhap", "ody", "coalesce"],
...         ],
...     }
... )
>>> df.select(pl.col("values").str.find_many("patterns"))
shape: (2, 1)
┌───────────┐
│ values    │
│ ---       │
│ list[u32] │
╞═══════════╡
│ [0]       │
│ [0, 5]    │
└───────────┘