polars.Expr.str.extract_groups#

Expr.str.extract_groups(pattern: str) Expr[source]#

Extract all capture groups for the given regex pattern.

Parameters:
pattern

A valid regular expression pattern, compatible with the regex crate.

Returns:
Expr

Expression of data type Struct with fields of data type Utf8.

Notes

All group names are strings.

If your pattern contains unnamed groups, their numerical position is converted to a string.

For example, here we access groups 2 and 3 via the names “2” and “3”:

>>> df = pl.DataFrame({"col": ["foo bar baz"]})
>>> (
...     df.with_columns(
...         pl.col("col").str.extract_groups(r"(\S+) (\S+) (.+)")
...     ).select(pl.col("col").struct["2"], pl.col("col").struct["3"])
... )
shape: (1, 2)
┌─────┬─────┐
│ 2   ┆ 3   │
│ --- ┆ --- │
│ str ┆ str │
╞═════╪═════╡
│ bar ┆ baz │
└─────┴─────┘

Examples

>>> df = pl.DataFrame(
...     data={
...         "url": [
...             "http://vote.com/ballon_dor?candidate=messi&ref=python",
...             "http://vote.com/ballon_dor?candidate=weghorst&ref=polars",
...             "http://vote.com/ballon_dor?error=404&ref=rust",
...         ]
...     }
... )
>>> pattern = r"candidate=(?<candidate>\w+)&ref=(?<ref>\w+)"
>>> df.select(captures=pl.col("url").str.extract_groups(pattern)).unnest(
...     "captures"
... )
shape: (3, 2)
┌───────────┬────────┐
│ candidate ┆ ref    │
│ ---       ┆ ---    │
│ str       ┆ str    │
╞═══════════╪════════╡
│ messi     ┆ python │
│ weghorst  ┆ polars │
│ null      ┆ null   │
└───────────┴────────┘

Unnamed groups have their numerical position converted to a string:

>>> pattern = r"candidate=(\w+)&ref=(\w+)"
>>> (
...     df.with_columns(
...         captures=pl.col("url").str.extract_groups(pattern)
...     ).with_columns(name=pl.col("captures").struct["1"].str.to_uppercase())
... )
shape: (3, 3)
┌───────────────────────────────────┬───────────────────────┬──────────┐
│ url                               ┆ captures              ┆ name     │
│ ---                               ┆ ---                   ┆ ---      │
│ str                               ┆ struct[2]             ┆ str      │
╞═══════════════════════════════════╪═══════════════════════╪══════════╡
│ http://vote.com/ballon_dor?candi… ┆ {"messi","python"}    ┆ MESSI    │
│ http://vote.com/ballon_dor?candi… ┆ {"weghorst","polars"} ┆ WEGHORST │
│ http://vote.com/ballon_dor?error… ┆ {null,null}           ┆ null     │
└───────────────────────────────────┴───────────────────────┴──────────┘