polars.Expr.over#

Expr.over(
partition_by: IntoExpr | Iterable[IntoExpr] | None = None,
*more_exprs: IntoExpr,
order_by: IntoExpr | Iterable[IntoExpr] | None = None,
descending: bool = False,
nulls_last: bool = False,
mapping_strategy: WindowMappingStrategy = 'group_to_rows',
) Expr[source]#

Compute expressions over the given groups.

This expression is similar to performing a group by aggregation and joining the result back into the original DataFrame.

The outcome is similar to how window functions work in PostgreSQL.

Parameters:
partition_by

Column(s) to group by. Accepts expression input. Strings are parsed as column names.

*more_exprs

Additional columns to group by, specified as positional arguments.

order_by

Order the window functions/aggregations with the partitioned groups by the result of the expression passed to order_by.

descending

In case ‘order_by’ is given, indicate whether to order in ascending or descending order.

nulls_last

In case ‘order_by’ is given, indicate whether to order the nulls in last position.

mapping_strategy: {‘group_to_rows’, ‘join’, ‘explode’}
  • group_to_rows

    If the aggregation results in multiple values per group, map them back to their row position in the DataFrame. This can only be done if each group yields the same elements before aggregation as after. If the aggregation results in one scalar value per group, this value will be mapped to every row.

  • join

    If the aggregation may result in multiple values per group, join the values as ‘List<group_dtype>’ to each row position. Warning: this can be memory intensive. If the aggregation always results in one scalar value per group, join this value as ‘<group_dtype>’ to each row position.

  • explode

    If the aggregation may result in multiple values per group, map each value to a new row, similar to the results of group_by + agg + explode. If the aggregation always results in one scalar value per group, map this value to one row position. Sorting of the given groups is required if the groups are not part of the window operation for the operation, otherwise the result would not make sense. This operation changes the number of rows.

Examples

Pass the name of a column to compute the expression over that column.

>>> df = pl.DataFrame(
...     {
...         "a": ["a", "a", "b", "b", "b"],
...         "b": [1, 2, 3, 5, 3],
...         "c": [5, 4, 3, 2, 1],
...     }
... )
>>> df.with_columns(c_max=pl.col("c").max().over("a"))
shape: (5, 4)
┌─────┬─────┬─────┬───────┐
│ a   ┆ b   ┆ c   ┆ c_max │
│ --- ┆ --- ┆ --- ┆ ---   │
│ str ┆ i64 ┆ i64 ┆ i64   │
╞═════╪═════╪═════╪═══════╡
│ a   ┆ 1   ┆ 5   ┆ 5     │
│ a   ┆ 2   ┆ 4   ┆ 5     │
│ b   ┆ 3   ┆ 3   ┆ 3     │
│ b   ┆ 5   ┆ 2   ┆ 3     │
│ b   ┆ 3   ┆ 1   ┆ 3     │
└─────┴─────┴─────┴───────┘

Expression input is also supported.

>>> df.with_columns(c_max=pl.col("c").max().over(pl.col("b") // 2))
shape: (5, 4)
┌─────┬─────┬─────┬───────┐
│ a   ┆ b   ┆ c   ┆ c_max │
│ --- ┆ --- ┆ --- ┆ ---   │
│ str ┆ i64 ┆ i64 ┆ i64   │
╞═════╪═════╪═════╪═══════╡
│ a   ┆ 1   ┆ 5   ┆ 5     │
│ a   ┆ 2   ┆ 4   ┆ 4     │
│ b   ┆ 3   ┆ 3   ┆ 4     │
│ b   ┆ 5   ┆ 2   ┆ 2     │
│ b   ┆ 3   ┆ 1   ┆ 4     │
└─────┴─────┴─────┴───────┘

Group by multiple columns by passing multiple column names or expressions.

>>> df.with_columns(c_min=pl.col("c").min().over("a", pl.col("b") % 2))
shape: (5, 4)
┌─────┬─────┬─────┬───────┐
│ a   ┆ b   ┆ c   ┆ c_min │
│ --- ┆ --- ┆ --- ┆ ---   │
│ str ┆ i64 ┆ i64 ┆ i64   │
╞═════╪═════╪═════╪═══════╡
│ a   ┆ 1   ┆ 5   ┆ 5     │
│ a   ┆ 2   ┆ 4   ┆ 4     │
│ b   ┆ 3   ┆ 3   ┆ 1     │
│ b   ┆ 5   ┆ 2   ┆ 1     │
│ b   ┆ 3   ┆ 1   ┆ 1     │
└─────┴─────┴─────┴───────┘

Mapping strategy join joins the values by group.

>>> df.with_columns(
...     c_pairs=pl.col("c").head(2).over("a", mapping_strategy="join")
... )
shape: (5, 4)
┌─────┬─────┬─────┬───────────┐
│ a   ┆ b   ┆ c   ┆ c_pairs   │
│ --- ┆ --- ┆ --- ┆ ---       │
│ str ┆ i64 ┆ i64 ┆ list[i64] │
╞═════╪═════╪═════╪═══════════╡
│ a   ┆ 1   ┆ 5   ┆ [5, 4]    │
│ a   ┆ 2   ┆ 4   ┆ [5, 4]    │
│ b   ┆ 3   ┆ 3   ┆ [3, 2]    │
│ b   ┆ 5   ┆ 2   ┆ [3, 2]    │
│ b   ┆ 3   ┆ 1   ┆ [3, 2]    │
└─────┴─────┴─────┴───────────┘

Mapping strategy explode maps the values to new rows, changing the shape.

>>> df.select(
...     c_first_2=pl.col("c").head(2).over("a", mapping_strategy="explode")
... )
shape: (4, 1)
┌───────────┐
│ c_first_2 │
│ ---       │
│ i64       │
╞═══════════╡
│ 5         │
│ 4         │
│ 3         │
│ 2         │
└───────────┘

You can use non-elementwise expressions with over too. By default they are evaluated using row-order, but you can specify a different one using order_by.

>>> from datetime import date
>>> df = pl.DataFrame(
...     {
...         "store_id": ["a", "a", "b", "b"],
...         "date": [
...             date(2024, 9, 18),
...             date(2024, 9, 17),
...             date(2024, 9, 18),
...             date(2024, 9, 16),
...         ],
...         "sales": [7, 9, 8, 10],
...     }
... )
>>> df.with_columns(
...     cumulative_sales=pl.col("sales")
...     .cum_sum()
...     .over("store_id", order_by="date")
... )
shape: (4, 4)
┌──────────┬────────────┬───────┬──────────────────┐
│ store_id ┆ date       ┆ sales ┆ cumulative_sales │
│ ---      ┆ ---        ┆ ---   ┆ ---              │
│ str      ┆ date       ┆ i64   ┆ i64              │
╞══════════╪════════════╪═══════╪══════════════════╡
│ a        ┆ 2024-09-18 ┆ 7     ┆ 16               │
│ a        ┆ 2024-09-17 ┆ 9     ┆ 9                │
│ b        ┆ 2024-09-18 ┆ 8     ┆ 18               │
│ b        ┆ 2024-09-16 ┆ 10    ┆ 10               │
└──────────┴────────────┴───────┴──────────────────┘

If you don’t require that the group order be preserved, then the more performant option is to use mapping_strategy='explode' - be careful however to only ever use this in a select statement, not a with_columns one.

>>> window = {
...     "partition_by": "store_id",
...     "order_by": "date",
...     "mapping_strategy": "explode",
... }
>>> df.select(
...     pl.all().over(**window),
...     cumulative_sales=pl.col("sales").cum_sum().over(**window),
... )
shape: (4, 4)
┌──────────┬────────────┬───────┬──────────────────┐
│ store_id ┆ date       ┆ sales ┆ cumulative_sales │
│ ---      ┆ ---        ┆ ---   ┆ ---              │
│ str      ┆ date       ┆ i64   ┆ i64              │
╞══════════╪════════════╪═══════╪══════════════════╡
│ a        ┆ 2024-09-17 ┆ 9     ┆ 9                │
│ a        ┆ 2024-09-18 ┆ 7     ┆ 16               │
│ b        ┆ 2024-09-16 ┆ 10    ┆ 10               │
│ b        ┆ 2024-09-18 ┆ 8     ┆ 18               │
└──────────┴────────────┴───────┴──────────────────┘