polars.map_groups#

polars.map_groups(
exprs: Sequence[str | Expr],
function: Callable[[Sequence[Series]], Series | Any],
return_dtype: PolarsDataType | DataTypeExpr | None = None,
*,
is_elementwise: bool = False,
returns_scalar: bool = False,
) Expr[source]#

Apply a custom/user-defined function (UDF) in a GroupBy context.

Warning

This method is much slower than the native expressions API. Only use it if you cannot implement your logic otherwise.

Parameters:
exprs

Expression(s) representing the input Series to the function.

function

Function to apply over the input; should be of type Callable[[Series], Series].

return_dtype

Datatype of the output Series.

It is recommended to set this whenever possible. If this is None, it tries to infer the datatype by calling the function with dummy data and looking at the output.

is_elementwise

Set to true if the operations is elementwise for better performance and optimization.

An elementwise operations has unit or equal length for all inputs and can be ran sequentially on slices without results being affected.

returns_scalar

If the function returns a single scalar as output.

Returns:
Expr

Expression with the data type given by return_dtype.

Notes

A UDF passed to map_batches must be pure, meaning that it cannot modify or depend on state other than its arguments. Polars may call the function with arbitrary input data.

Examples

>>> df = pl.DataFrame(
...     {
...         "group": [1, 1, 2],
...         "a": [1, 3, 3],
...         "b": [5, 6, 7],
...     }
... )
>>> df
shape: (3, 3)
┌───────┬─────┬─────┐
│ group ┆ a   ┆ b   │
│ ---   ┆ --- ┆ --- │
│ i64   ┆ i64 ┆ i64 │
╞═══════╪═════╪═════╡
│ 1     ┆ 1   ┆ 5   │
│ 1     ┆ 3   ┆ 6   │
│ 2     ┆ 3   ┆ 7   │
└───────┴─────┴─────┘
>>> (
...     df.group_by("group").agg(
...         pl.map_groups(
...             exprs=["a", "b"],
...             function=lambda list_of_series: list_of_series[0]
...             / list_of_series[0].sum()
...             + list_of_series[1],
...             return_dtype=pl.Float64,
...         ).alias("my_custom_aggregation")
...     )
... ).sort("group")
shape: (2, 2)
┌───────┬───────────────────────┐
│ group ┆ my_custom_aggregation │
│ ---   ┆ ---                   │
│ i64   ┆ list[f64]             │
╞═══════╪═══════════════════════╡
│ 1     ┆ [5.25, 6.75]          │
│ 2     ┆ [8.0]                 │
└───────┴───────────────────────┘

The output for group 1 can be understood as follows:

  • group 1 contains Series 'a': [1, 3] and 'b': [5, 6]

  • applying the function to those lists of Series, one gets the output [1 / 4 + 5, 3 / 4 + 6], i.e. [5.25, 6.75]