polars.DataFrame.top_k#
- DataFrame.top_k(
- k: int,
- *,
- by: IntoExpr | Iterable[IntoExpr],
- descending: bool | Sequence[bool] = False,
- nulls_last: bool | Sequence[bool] | None = None,
- maintain_order: bool | None = None,
Return the
k
largest rows.- Parameters:
- k
Number of rows to return.
- by
Column(s) used to determine the top rows. Accepts expression input. Strings are parsed as column names.
- descending
Consider the
k
smallest elements of theby
column(s) (instead of thek
largest). This can be specified per column by passing a sequence of booleans.- nulls_last
Place null values last.
Deprecated since version 0.20.31: This parameter will be removed in the next breaking release. Null values will be considered lowest priority and will only be included if
k
is larger than the number of non-null elements.- maintain_order
Whether the order should be maintained if elements are equal. Note that if
true
streaming is not possible and performance might be worse since this requires a stable search.Deprecated since version 0.20.31: This parameter will be removed in the next breaking release. There will be no guarantees about the order of the output.
See also
Examples
>>> df = pl.DataFrame( ... { ... "a": ["a", "b", "a", "b", "b", "c"], ... "b": [2, 1, 1, 3, 2, 1], ... } ... )
Get the rows which contain the 4 largest values in column b.
>>> df.top_k(4, by="b") shape: (4, 2) ┌─────┬─────┐ │ a ┆ b │ │ --- ┆ --- │ │ str ┆ i64 │ ╞═════╪═════╡ │ b ┆ 3 │ │ a ┆ 2 │ │ b ┆ 2 │ │ b ┆ 1 │ └─────┴─────┘
Get the rows which contain the 4 largest values when sorting on column b and a.
>>> df.top_k(4, by=["b", "a"]) shape: (4, 2) ┌─────┬─────┐ │ a ┆ b │ │ --- ┆ --- │ │ str ┆ i64 │ ╞═════╪═════╡ │ b ┆ 3 │ │ b ┆ 2 │ │ a ┆ 2 │ │ c ┆ 1 │ └─────┴─────┘