polars.Series.str.extract#

Series.str.extract(pattern: str, group_index: int = 1) Series[source]#

Extract the target capture group from provided patterns.

Parameters:
pattern

A valid regular expression pattern, compatible with the regex crate.

group_index

Index of the targeted capture group. Group 0 mean the whole pattern, first group begin at index 1 Default to the first capture group

Returns:
Series

Series of data type Utf8. Contains null values if the original value is null or regex captures nothing.

Notes

To modify regular expression behaviour (such as multi-line matching) with flags, use the inline (?iLmsuxU) syntax. For example:

>>> s = pl.Series(
...     name="lines",
...     values=[
...         "I Like\nThose\nOdds",
...         "This is\nThe Way",
...     ],
... )
>>> s.str.extract(r"(?m)^(T\w+)", 1).alias("matches")
shape: (2,)
Series: 'matches' [str]
[
    "Those"
    "This"
]

See the regex crate’s section on grouping and flags for additional information about the use of inline expression modifiers.

Examples

>>> s = pl.Series(
...     name="url",
...     values=[
...         "http://vote.com/ballon_dor?ref=polars&candidate=messi",
...         "http://vote.com/ballon_dor?candidate=ronaldo&ref=polars",
...         "http://vote.com/ballon_dor?error=404&ref=unknown",
...     ],
... )
>>> s.str.extract(r"candidate=(\w+)", 1).alias("candidate")
shape: (3,)
Series: 'candidate' [str]
[
    "messi"
    "ronaldo"
    null
]