polars.LazyFrame.groupby#
- LazyFrame.groupby(
- by: IntoExpr | Iterable[IntoExpr],
- *more_by: IntoExpr,
- maintain_order: bool = False,
Start a groupby operation.
- Parameters:
- by
Column(s) to group by. Accepts expression input. Strings are parsed as column names.
- *more_by
Additional columns to group by, specified as positional arguments.
- maintain_order
Ensure that the order of the groups is consistent with the input data. This is slower than a default groupby. Settings this to
Trueblocks the possibility to run on the streaming engine.
Examples
Group by one column and call
aggto compute the grouped sum of another column.>>> lf = pl.LazyFrame( ... { ... "a": ["a", "b", "a", "b", "c"], ... "b": [1, 2, 1, 3, 3], ... "c": [5, 4, 3, 2, 1], ... } ... ) >>> lf.groupby("a").agg(pl.col("b").sum()).collect() shape: (3, 2) ┌─────┬─────┐ │ a ┆ b │ │ --- ┆ --- │ │ str ┆ i64 │ ╞═════╪═════╡ │ a ┆ 2 │ │ b ┆ 5 │ │ c ┆ 3 │ └─────┴─────┘
Set
maintain_order=Trueto ensure the order of the groups is consistent with the input.>>> lf.groupby("a", maintain_order=True).agg(pl.col("c")).collect() shape: (3, 2) ┌─────┬───────────┐ │ a ┆ c │ │ --- ┆ --- │ │ str ┆ list[i64] │ ╞═════╪═══════════╡ │ a ┆ [5, 3] │ │ b ┆ [4, 2] │ │ c ┆ [1] │ └─────┴───────────┘
Group by multiple columns by passing a list of column names.
>>> lf.groupby(["a", "b"]).agg(pl.max("c")).collect() shape: (4, 3) ┌─────┬─────┬─────┐ │ a ┆ b ┆ c │ │ --- ┆ --- ┆ --- │ │ str ┆ i64 ┆ i64 │ ╞═════╪═════╪═════╡ │ a ┆ 1 ┆ 5 │ │ b ┆ 2 ┆ 4 │ │ b ┆ 3 ┆ 2 │ │ c ┆ 3 ┆ 1 │ └─────┴─────┴─────┘
Or use positional arguments to group by multiple columns in the same way. Expressions are also accepted.
>>> lf.groupby("a", pl.col("b") // 2).agg( ... pl.col("c").mean() ... ).collect() shape: (3, 3) ┌─────┬─────┬─────┐ │ a ┆ b ┆ c │ │ --- ┆ --- ┆ --- │ │ str ┆ i64 ┆ f64 │ ╞═════╪═════╪═════╡ │ a ┆ 0 ┆ 4.0 │ │ b ┆ 1 ┆ 3.0 │ │ c ┆ 1 ┆ 1.0 │ └─────┴─────┴─────┘