polars.map_groups#
- polars.map_groups(
- exprs: Sequence[str | Expr],
- function: Callable[[Sequence[Series]], Series | Any],
- return_dtype: PolarsDataType | None = None,
- *,
- returns_scalar: bool = True,
Apply a custom/user-defined function (UDF) in a GroupBy context.
Warning
This method is much slower than the native expressions API. Only use it if you cannot implement your logic otherwise.
- Parameters:
- exprs
Expression(s) representing the input Series to the function.
- function
Function to apply over the input; should be of type Callable[[Series], Series].
- return_dtype
dtype of the output Series.
- returns_scalar
If the function returns a single scalar as output.
- Returns:
- Expr
Expression with the data type given by
return_dtype
.
Examples
>>> df = pl.DataFrame( ... { ... "group": [1, 1, 2], ... "a": [1, 3, 3], ... "b": [5, 6, 7], ... } ... ) >>> df shape: (3, 3) ┌───────┬─────┬─────┐ │ group ┆ a ┆ b │ │ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ i64 │ ╞═══════╪═════╪═════╡ │ 1 ┆ 1 ┆ 5 │ │ 1 ┆ 3 ┆ 6 │ │ 2 ┆ 3 ┆ 7 │ └───────┴─────┴─────┘ >>> ( ... df.group_by("group").agg( ... pl.map_groups( ... exprs=["a", "b"], ... function=lambda list_of_series: list_of_series[0] ... / list_of_series[0].sum() ... + list_of_series[1], ... return_dtype=pl.Float64, ... ).alias("my_custom_aggregation") ... ) ... ).sort("group") shape: (2, 2) ┌───────┬───────────────────────┐ │ group ┆ my_custom_aggregation │ │ --- ┆ --- │ │ i64 ┆ list[f64] │ ╞═══════╪═══════════════════════╡ │ 1 ┆ [5.25, 6.75] │ │ 2 ┆ [8.0] │ └───────┴───────────────────────┘
The output for group
1
can be understood as follows:group
1
contains Series'a': [1, 3]
and'b': [5, 6]
applying the function to those lists of Series, one gets the output
[1 / 4 + 5, 3 / 4 + 6]
, i.e.[5.25, 6.75]