polars.Expr.str.extract_many#

Expr.str.extract_many(
patterns: IntoExpr,
*,
ascii_case_insensitive: bool = False,
overlapping: bool = False,
leftmost: bool = False,
) Expr[source]#

Use the Aho-Corasick algorithm to extract many matches.

Parameters:
patterns

String patterns to search.

ascii_case_insensitive

Enable ASCII-aware case-insensitive matching. When this option is enabled, searching will be performed without respect to case for ASCII letters (a-z and A-Z) only.

overlapping

Whether matches may overlap.

leftmost

Guarantees in case there are overlapping matches that the leftmost match is used. In case there are multiple candidates for the leftmost match the pattern which comes first in patterns is used. May not be used together with overlapping = True.

See also

replace_many

Notes

This method supports matching on string literals only, and does not support regular expression matching.

Examples

>>> _ = pl.Config.set_fmt_str_lengths(100)
>>> df = pl.DataFrame({"values": ["discontent"]})
>>> patterns = ["winter", "disco", "onte", "discontent"]
>>> df.with_columns(
...     pl.col("values")
...     .str.extract_many(patterns, overlapping=False)
...     .alias("matches"),
...     pl.col("values")
...     .str.extract_many(patterns, overlapping=True)
...     .alias("matches_overlapping"),
... )
shape: (1, 3)
┌────────────┬───────────┬─────────────────────────────────┐
│ values     ┆ matches   ┆ matches_overlapping             │
│ ---        ┆ ---       ┆ ---                             │
│ str        ┆ list[str] ┆ list[str]                       │
╞════════════╪═══════════╪═════════════════════════════════╡
│ discontent ┆ ["disco"] ┆ ["disco", "onte", "discontent"] │
└────────────┴───────────┴─────────────────────────────────┘
>>> df = pl.DataFrame(
...     {
...         "values": ["discontent", "rhapsody"],
...         "patterns": [
...             ["winter", "disco", "onte", "discontent"],
...             ["rhap", "ody", "coalesce"],
...         ],
...     }
... )
>>> df.select(pl.col("values").str.extract_many("patterns"))
shape: (2, 1)
┌─────────────────┐
│ values          │
│ ---             │
│ list[str]       │
╞═════════════════╡
│ ["disco"]       │
│ ["rhap", "ody"] │
└─────────────────┘