polars.Expr.bottom_k_by#
- Expr.bottom_k_by(
- by: IntoExpr | Iterable[IntoExpr],
- k: int | IntoExprColumn = 5,
- *,
- reverse: bool | Sequence[bool] = False,
Return the elements corresponding to the
ksmallest elements of thebycolumn(s).Non-null elements are always preferred over null elements, regardless of the value of
reverse. The output is not guaranteed to be in any particular order, callsort()after this function if you wish the output to be sorted.This has time complexity:
\[O(n \log{n})\]- Parameters:
- by
Column(s) used to determine the smallest elements. Accepts expression input. Strings are parsed as column names.
- k
Number of elements to return.
- reverse
Consider the
klargest elements of thebycolumn(s) (instead of theksmallest). This can be specified per column by passing a sequence of booleans.
Examples
>>> df = pl.DataFrame( ... { ... "a": [1, 2, 3, 4, 5, 6], ... "b": [6, 5, 4, 3, 2, 1], ... "c": ["Apple", "Orange", "Apple", "Apple", "Banana", "Banana"], ... } ... ) >>> df shape: (6, 3) ┌─────┬─────┬────────┐ │ a ┆ b ┆ c │ │ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ str │ ╞═════╪═════╪════════╡ │ 1 ┆ 6 ┆ Apple │ │ 2 ┆ 5 ┆ Orange │ │ 3 ┆ 4 ┆ Apple │ │ 4 ┆ 3 ┆ Apple │ │ 5 ┆ 2 ┆ Banana │ │ 6 ┆ 1 ┆ Banana │ └─────┴─────┴────────┘
Get the bottom 2 rows by column
aorb.>>> df.select( ... pl.all().bottom_k_by("a", 2).name.suffix("_btm_by_a"), ... pl.all().bottom_k_by("b", 2).name.suffix("_btm_by_b"), ... ) shape: (2, 6) ┌────────────┬────────────┬────────────┬────────────┬────────────┬────────────┐ │ a_btm_by_a ┆ b_btm_by_a ┆ c_btm_by_a ┆ a_btm_by_b ┆ b_btm_by_b ┆ c_btm_by_b │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ str ┆ i64 ┆ i64 ┆ str │ ╞════════════╪════════════╪════════════╪════════════╪════════════╪════════════╡ │ 1 ┆ 6 ┆ Apple ┆ 6 ┆ 1 ┆ Banana │ │ 2 ┆ 5 ┆ Orange ┆ 5 ┆ 2 ┆ Banana │ └────────────┴────────────┴────────────┴────────────┴────────────┴────────────┘
Get the bottom 2 rows by multiple columns with given order.
>>> df.select( ... pl.all() ... .bottom_k_by(["c", "a"], 2, reverse=[False, True]) ... .name.suffix("_by_ca"), ... pl.all() ... .bottom_k_by(["c", "b"], 2, reverse=[False, True]) ... .name.suffix("_by_cb"), ... ) shape: (2, 6) ┌─────────┬─────────┬─────────┬─────────┬─────────┬─────────┐ │ a_by_ca ┆ b_by_ca ┆ c_by_ca ┆ a_by_cb ┆ b_by_cb ┆ c_by_cb │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ str ┆ i64 ┆ i64 ┆ str │ ╞═════════╪═════════╪═════════╪═════════╪═════════╪═════════╡ │ 4 ┆ 3 ┆ Apple ┆ 1 ┆ 6 ┆ Apple │ │ 3 ┆ 4 ┆ Apple ┆ 3 ┆ 4 ┆ Apple │ └─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘
Get the bottom 2 rows by column
ain each group.>>> ( ... df.group_by("c", maintain_order=True) ... .agg(pl.all().bottom_k_by("a", 2)) ... .explode(pl.all().exclude("c")) ... ) shape: (5, 3) ┌────────┬─────┬─────┐ │ c ┆ a ┆ b │ │ --- ┆ --- ┆ --- │ │ str ┆ i64 ┆ i64 │ ╞════════╪═════╪═════╡ │ Apple ┆ 1 ┆ 6 │ │ Apple ┆ 3 ┆ 4 │ │ Orange ┆ 2 ┆ 5 │ │ Banana ┆ 5 ┆ 2 │ │ Banana ┆ 6 ┆ 1 │ └────────┴─────┴─────┘