polars.Expr.over#
- Expr.over(
- partition_by: IntoExpr | Iterable[IntoExpr],
- *more_exprs: IntoExpr,
- order_by: IntoExpr | Iterable[IntoExpr] | None = None,
- mapping_strategy: WindowMappingStrategy = 'group_to_rows',
Compute expressions over the given groups.
This expression is similar to performing a group by aggregation and joining the result back into the original DataFrame.
The outcome is similar to how window functions work in PostgreSQL.
- Parameters:
- partition_by
Column(s) to group by. Accepts expression input. Strings are parsed as column names.
- *more_exprs
Additional columns to group by, specified as positional arguments.
- order_by:
Order the window functions/aggregations with the partitioned groups by the result of the expression passed to
order_by
.- mapping_strategy: {‘group_to_rows’, ‘join’, ‘explode’}
- group_to_rows
If the aggregation results in multiple values, assign them back to their position in the DataFrame. This can only be done if the group yields the same elements before aggregation as after.
- join
Join the groups as ‘List<group_dtype>’ to the row positions. warning: this can be memory intensive.
- explode
Explodes the grouped data into new rows, similar to the results of
group_by
+agg
+explode
. Sorting of the given groups is required if the groups are not part of the window operation for the operation, otherwise the result would not make sense. This operation changes the number of rows.
Examples
Pass the name of a column to compute the expression over that column.
>>> df = pl.DataFrame( ... { ... "a": ["a", "a", "b", "b", "b"], ... "b": [1, 2, 3, 5, 3], ... "c": [5, 4, 3, 2, 1], ... } ... ) >>> df.with_columns(c_max=pl.col("c").max().over("a")) shape: (5, 4) ┌─────┬─────┬─────┬───────┐ │ a ┆ b ┆ c ┆ c_max │ │ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ i64 ┆ i64 ┆ i64 │ ╞═════╪═════╪═════╪═══════╡ │ a ┆ 1 ┆ 5 ┆ 5 │ │ a ┆ 2 ┆ 4 ┆ 5 │ │ b ┆ 3 ┆ 3 ┆ 3 │ │ b ┆ 5 ┆ 2 ┆ 3 │ │ b ┆ 3 ┆ 1 ┆ 3 │ └─────┴─────┴─────┴───────┘
Expression input is also supported.
>>> df.with_columns(c_max=pl.col("c").max().over(pl.col("b") // 2)) shape: (5, 4) ┌─────┬─────┬─────┬───────┐ │ a ┆ b ┆ c ┆ c_max │ │ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ i64 ┆ i64 ┆ i64 │ ╞═════╪═════╪═════╪═══════╡ │ a ┆ 1 ┆ 5 ┆ 5 │ │ a ┆ 2 ┆ 4 ┆ 4 │ │ b ┆ 3 ┆ 3 ┆ 4 │ │ b ┆ 5 ┆ 2 ┆ 2 │ │ b ┆ 3 ┆ 1 ┆ 4 │ └─────┴─────┴─────┴───────┘
Group by multiple columns by passing multiple column names or expressions.
>>> df.with_columns(c_min=pl.col("c").min().over("a", pl.col("b") % 2)) shape: (5, 4) ┌─────┬─────┬─────┬───────┐ │ a ┆ b ┆ c ┆ c_min │ │ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ i64 ┆ i64 ┆ i64 │ ╞═════╪═════╪═════╪═══════╡ │ a ┆ 1 ┆ 5 ┆ 5 │ │ a ┆ 2 ┆ 4 ┆ 4 │ │ b ┆ 3 ┆ 3 ┆ 1 │ │ b ┆ 5 ┆ 2 ┆ 1 │ │ b ┆ 3 ┆ 1 ┆ 1 │ └─────┴─────┴─────┴───────┘
You can use non-elementwise expressions with
over
too. By default they are evaluated using row-order, but you can specify a different one usingorder_by
.>>> from datetime import date >>> df = pl.DataFrame( ... { ... "store_id": ["a", "a", "b", "b"], ... "date": [ ... date(2024, 9, 18), ... date(2024, 9, 17), ... date(2024, 9, 18), ... date(2024, 9, 16), ... ], ... "sales": [7, 9, 8, 10], ... } ... ) >>> df.with_columns( ... cumulative_sales=pl.col("sales") ... .cum_sum() ... .over("store_id", order_by="date") ... ) shape: (4, 4) ┌──────────┬────────────┬───────┬──────────────────┐ │ store_id ┆ date ┆ sales ┆ cumulative_sales │ │ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ date ┆ i64 ┆ i64 │ ╞══════════╪════════════╪═══════╪══════════════════╡ │ a ┆ 2024-09-18 ┆ 7 ┆ 16 │ │ a ┆ 2024-09-17 ┆ 9 ┆ 9 │ │ b ┆ 2024-09-18 ┆ 8 ┆ 18 │ │ b ┆ 2024-09-16 ┆ 10 ┆ 10 │ └──────────┴────────────┴───────┴──────────────────┘
If you don’t require that the group order be preserved, then the more performant option is to use
mapping_strategy='explode'
- be careful however to only ever use this in aselect
statement, not awith_columns
one.>>> window = { ... "partition_by": "store_id", ... "order_by": "date", ... "mapping_strategy": "explode", ... } >>> df.select( ... pl.all().over(**window), ... cumulative_sales=pl.col("sales").cum_sum().over(**window), ... ) shape: (4, 4) ┌──────────┬────────────┬───────┬──────────────────┐ │ store_id ┆ date ┆ sales ┆ cumulative_sales │ │ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ date ┆ i64 ┆ i64 │ ╞══════════╪════════════╪═══════╪══════════════════╡ │ a ┆ 2024-09-17 ┆ 9 ┆ 9 │ │ a ┆ 2024-09-18 ┆ 7 ┆ 16 │ │ b ┆ 2024-09-16 ┆ 10 ┆ 10 │ │ b ┆ 2024-09-18 ┆ 8 ┆ 18 │ └──────────┴────────────┴───────┴──────────────────┘