polars.Series.str.extract#
- Series.str.extract(pattern: str, group_index: int = 1) Series [source]#
Extract the target capture group from provided patterns.
- Parameters:
- pattern
A valid regular expression pattern, compatible with the regex crate.
- group_index
Index of the targeted capture group. Group 0 means the whole pattern, the first group begin at index 1. Defaults to the first capture group.
- Returns:
- Series
Series of data type
Utf8
. Contains null values if the original value is null or regex captures nothing.
Notes
To modify regular expression behaviour (such as multi-line matching) with flags, use the inline
(?iLmsuxU)
syntax. For example:>>> s = pl.Series( ... name="lines", ... values=[ ... "I Like\nThose\nOdds", ... "This is\nThe Way", ... ], ... ) >>> s.str.extract(r"(?m)^(T\w+)", 1).alias("matches") shape: (2,) Series: 'matches' [str] [ "Those" "This" ]
See the regex crate’s section on grouping and flags for additional information about the use of inline expression modifiers.
Examples
>>> s = pl.Series( ... name="url", ... values=[ ... "http://vote.com/ballon_dor?ref=polars&candidate=messi", ... "http://vote.com/ballon_dor?candidate=ronaldo&ref=polars", ... "http://vote.com/ballon_dor?error=404&ref=unknown", ... ], ... ) >>> s.str.extract(r"candidate=(\w+)", 1).alias("candidate") shape: (3,) Series: 'candidate' [str] [ "messi" "ronaldo" null ]