polars.Expr.over#

Expr.over(
expr: IntoExpr | Iterable[IntoExpr],
*more_exprs: IntoExpr,
mapping_strategy: WindowMappingStrategy = 'group_to_rows',
) Self[source]#

Compute expressions over the given groups.

This expression is similar to performing a group by aggregation and joining the result back into the original DataFrame.

The outcome is similar to how window functions work in PostgreSQL.

Parameters:
expr

Column(s) to group by. Accepts expression input. Strings are parsed as column names.

*more_exprs

Additional columns to group by, specified as positional arguments.

mapping_strategy: {‘group_to_rows’, ‘join’, ‘explode’}
  • group_to_rows

    If the aggregation results in multiple values, assign them back to their position in the DataFrame. This can only be done if the group yields the same elements before aggregation as after.

  • join

    Join the groups as ‘List<group_dtype>’ to the row positions. warning: this can be memory intensive.

  • explode

    Don’t do any mapping, but simply flatten the group. This only makes sense if the input data is sorted.

Examples

Pass the name of a column to compute the expression over that column.

>>> df = pl.DataFrame(
...     {
...         "a": ["a", "a", "b", "b", "b"],
...         "b": [1, 2, 3, 5, 3],
...         "c": [5, 4, 3, 2, 1],
...     }
... )
>>> df.with_columns(
...     pl.col("c").max().over("a").name.suffix("_max"),
... )
shape: (5, 4)
┌─────┬─────┬─────┬───────┐
│ a   ┆ b   ┆ c   ┆ c_max │
│ --- ┆ --- ┆ --- ┆ ---   │
│ str ┆ i64 ┆ i64 ┆ i64   │
╞═════╪═════╪═════╪═══════╡
│ a   ┆ 1   ┆ 5   ┆ 5     │
│ a   ┆ 2   ┆ 4   ┆ 5     │
│ b   ┆ 3   ┆ 3   ┆ 3     │
│ b   ┆ 5   ┆ 2   ┆ 3     │
│ b   ┆ 3   ┆ 1   ┆ 3     │
└─────┴─────┴─────┴───────┘

Expression input is supported.

>>> df.with_columns(
...     pl.col("c").max().over(pl.col("b") // 2).name.suffix("_max"),
... )
shape: (5, 4)
┌─────┬─────┬─────┬───────┐
│ a   ┆ b   ┆ c   ┆ c_max │
│ --- ┆ --- ┆ --- ┆ ---   │
│ str ┆ i64 ┆ i64 ┆ i64   │
╞═════╪═════╪═════╪═══════╡
│ a   ┆ 1   ┆ 5   ┆ 5     │
│ a   ┆ 2   ┆ 4   ┆ 4     │
│ b   ┆ 3   ┆ 3   ┆ 4     │
│ b   ┆ 5   ┆ 2   ┆ 2     │
│ b   ┆ 3   ┆ 1   ┆ 4     │
└─────┴─────┴─────┴───────┘

Group by multiple columns by passing a list of column names or expressions.

>>> df.with_columns(
...     pl.col("c").min().over(["a", "b"]).name.suffix("_min"),
... )
shape: (5, 4)
┌─────┬─────┬─────┬───────┐
│ a   ┆ b   ┆ c   ┆ c_min │
│ --- ┆ --- ┆ --- ┆ ---   │
│ str ┆ i64 ┆ i64 ┆ i64   │
╞═════╪═════╪═════╪═══════╡
│ a   ┆ 1   ┆ 5   ┆ 5     │
│ a   ┆ 2   ┆ 4   ┆ 4     │
│ b   ┆ 3   ┆ 3   ┆ 1     │
│ b   ┆ 5   ┆ 2   ┆ 2     │
│ b   ┆ 3   ┆ 1   ┆ 1     │
└─────┴─────┴─────┴───────┘

Or use positional arguments to group by multiple columns in the same way.

>>> df.with_columns(
...     pl.col("c").min().over("a", pl.col("b") % 2).name.suffix("_min"),
... )
shape: (5, 4)
┌─────┬─────┬─────┬───────┐
│ a   ┆ b   ┆ c   ┆ c_min │
│ --- ┆ --- ┆ --- ┆ ---   │
│ str ┆ i64 ┆ i64 ┆ i64   │
╞═════╪═════╪═════╪═══════╡
│ a   ┆ 1   ┆ 5   ┆ 5     │
│ a   ┆ 2   ┆ 4   ┆ 4     │
│ b   ┆ 3   ┆ 3   ┆ 1     │
│ b   ┆ 5   ┆ 2   ┆ 1     │
│ b   ┆ 3   ┆ 1   ┆ 1     │
└─────┴─────┴─────┴───────┘