polars.map_groups#
- polars.map_groups(
- exprs: Sequence[str | Expr],
- function: Callable[[Sequence[Series]], Series | Any],
- return_dtype: PolarsDataType | DataTypeExpr | None = None,
- *,
- is_elementwise: bool = False,
- returns_scalar: bool = False,
Apply a custom/user-defined function (UDF) in a GroupBy context.
Warning
This method is much slower than the native expressions API. Only use it if you cannot implement your logic otherwise.
- Parameters:
- exprs
Expression(s) representing the input Series to the function.
- function
Function to apply over the input; should be of type Callable[[Series], Series].
- return_dtype
Datatype of the output Series.
It is recommended to set this whenever possible. If this is
None
, it tries to infer the datatype by calling the function with dummy data and looking at the output.- is_elementwise
Set to true if the operations is elementwise for better performance and optimization.
An elementwise operations has unit or equal length for all inputs and can be ran sequentially on slices without results being affected.
- returns_scalar
If the function returns a single scalar as output.
- Returns:
- Expr
Expression with the data type given by
return_dtype
.
Notes
A UDF passed to
map_batches
must be pure, meaning that it cannot modify or depend on state other than its arguments. Polars may call the function with arbitrary input data.Examples
>>> df = pl.DataFrame( ... { ... "group": [1, 1, 2], ... "a": [1, 3, 3], ... "b": [5, 6, 7], ... } ... ) >>> df shape: (3, 3) ┌───────┬─────┬─────┐ │ group ┆ a ┆ b │ │ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ i64 │ ╞═══════╪═════╪═════╡ │ 1 ┆ 1 ┆ 5 │ │ 1 ┆ 3 ┆ 6 │ │ 2 ┆ 3 ┆ 7 │ └───────┴─────┴─────┘ >>> ( ... df.group_by("group").agg( ... pl.map_groups( ... exprs=["a", "b"], ... function=lambda list_of_series: list_of_series[0] ... / list_of_series[0].sum() ... + list_of_series[1], ... return_dtype=pl.Float64, ... ).alias("my_custom_aggregation") ... ) ... ).sort("group") shape: (2, 2) ┌───────┬───────────────────────┐ │ group ┆ my_custom_aggregation │ │ --- ┆ --- │ │ i64 ┆ list[f64] │ ╞═══════╪═══════════════════════╡ │ 1 ┆ [5.25, 6.75] │ │ 2 ┆ [8.0] │ └───────┴───────────────────────┘
The output for group
1
can be understood as follows:group
1
contains Series'a': [1, 3]
and'b': [5, 6]
applying the function to those lists of Series, one gets the output
[1 / 4 + 5, 3 / 4 + 6]
, i.e.[5.25, 6.75]