polars.Series.str.extract_all#
- Series.str.extract_all(pattern: str | Series) Series [source]#
Extract all matches for the given regex pattern.
Extract each successive non-overlapping regex match in an individual string as a list. If the haystack string is
null
,null
is returned.- Parameters:
- pattern
A valid regular expression pattern, compatible with the regex crate.
- Returns:
- Series
Series of data type
List(Utf8)
.
Notes
To modify regular expression behaviour (such as “verbose” mode and/or case-sensitive matching) with flags, use the inline
(?iLmsuxU)
syntax. For example:>>> s = pl.Series( ... name="email", ... values=[ ... "real.email@spam.com", ... "some_account@somewhere.net", ... "abc.def.ghi.jkl@uvw.xyz.co.uk", ... ], ... ) >>> # extract name/domain parts from email, using verbose regex >>> s.str.extract_all( ... r"""(?xi) # activate 'verbose' and 'case-insensitive' flags ... [ # (start character group) ... A-Z # letters ... 0-9 # digits ... ._%+\- # special chars ... ] # (end character group) ... + # 'one or more' quantifier ... """ ... ).alias("email_parts") shape: (3,) Series: 'email_parts' [list[str]] [ ["real.email", "spam.com"] ["some_account", "somewhere.net"] ["abc.def.ghi.jkl", "uvw.xyz.co.uk"] ]
See the regex crate’s section on grouping and flags for additional information about the use of inline expression modifiers.
Examples
>>> s = pl.Series("foo", ["123 bla 45 asd", "xyz 678 910t", "bar", None]) >>> s.str.extract_all(r"\d+") shape: (4,) Series: 'foo' [list[str]] [ ["123", "45"] ["678", "910"] [] null ]