polars.DataFrame.top_k#

Return the k largest rows.

Parameters:

k: Number of rows to return.
by: Column(s) used to determine the top rows. Accepts expression input. Strings are parsed as column names.
descending: Consider the k smallest elements of the by column(s) (instead of the k largest). This can be specified per column by passing a sequence of booleans.
nulls_last: Place null values last.

Deprecated since version 0.20.31: This parameter will be removed in the next breaking release. Null values will be considered lowest priority and will only be included if k is larger than the number of non-null elements.
maintain_order: Whether the order should be maintained if elements are equal. Note that if true streaming is not possible and performance might be worse since this requires a stable search.

Deprecated since version 0.20.31: This parameter will be removed in the next breaking release. There will be no guarantees about the order of the output.

See also

bottom_k

Examples

>>> df = pl.DataFrame(
...     {
...         "a": ["a", "b", "a", "b", "b", "c"],
...         "b": [2, 1, 1, 3, 2, 1],
...     }
... )

Get the rows which contain the 4 largest values in column b.

>>> df.top_k(4, by="b")
shape: (4, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ b   ┆ 3   │
│ a   ┆ 2   │
│ b   ┆ 2   │
│ b   ┆ 1   │
└─────┴─────┘

Get the rows which contain the 4 largest values when sorting on column b and a.

>>> df.top_k(4, by=["b", "a"])
shape: (4, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ b   ┆ 3   │
│ b   ┆ 2   │
│ a   ┆ 2   │
│ c   ┆ 1   │
└─────┴─────┘