polars.Expr.str.extract_many#
- Expr.str.extract_many(
- patterns: IntoExpr,
- *,
- ascii_case_insensitive: bool = False,
- overlapping: bool = False,
- leftmost: bool = False,
Use the Aho-Corasick algorithm to extract many matches.
- Parameters:
- patterns
String patterns to search.
- ascii_case_insensitive
Enable ASCII-aware case-insensitive matching. When this option is enabled, searching will be performed without respect to case for ASCII letters (a-z and A-Z) only.
- overlapping
Whether matches may overlap.
- leftmost
Guarantees in case there are overlapping matches that the leftmost match is used. In case there are multiple candidates for the leftmost match the pattern which comes first in patterns is used. May not be used together with overlapping = True.
See also
Notes
This method supports matching on string literals only, and does not support regular expression matching.
Examples
>>> _ = pl.Config.set_fmt_str_lengths(100) >>> df = pl.DataFrame({"values": ["discontent"]}) >>> patterns = ["winter", "disco", "onte", "discontent"] >>> df.with_columns( ... pl.col("values") ... .str.extract_many(patterns, overlapping=False) ... .alias("matches"), ... pl.col("values") ... .str.extract_many(patterns, overlapping=True) ... .alias("matches_overlapping"), ... ) shape: (1, 3) ┌────────────┬───────────┬─────────────────────────────────┐ │ values ┆ matches ┆ matches_overlapping │ │ --- ┆ --- ┆ --- │ │ str ┆ list[str] ┆ list[str] │ ╞════════════╪═══════════╪═════════════════════════════════╡ │ discontent ┆ ["disco"] ┆ ["disco", "onte", "discontent"] │ └────────────┴───────────┴─────────────────────────────────┘ >>> df = pl.DataFrame( ... { ... "values": ["discontent", "rhapsody"], ... "patterns": [ ... ["winter", "disco", "onte", "discontent"], ... ["rhap", "ody", "coalesce"], ... ], ... } ... ) >>> df.select(pl.col("values").str.extract_many("patterns")) shape: (2, 1) ┌─────────────────┐ │ values │ │ --- │ │ list[str] │ ╞═════════════════╡ │ ["disco"] │ │ ["rhap", "ody"] │ └─────────────────┘