polars.Expr.value_counts#

Expr.value_counts( *, sort: bool = False, parallel: bool = False, name: str | None = None, normalize: bool = False, ) → Expr[source]#

Count the occurrences of unique values.

Parameters:

sort: Sort the output by count in descending order. If set to False (default), the order of the output is random.
parallel: Execute the computation in parallel.

Note

This option should likely not be enabled in a group by context, as the computation is already parallelized per group.
name: Give the resulting count column a specific name; if normalize is True defaults to “proportion”, otherwise defaults to “count”.
normalize: If true gives relative frequencies of the unique values

Returns:

Expr: Expression of data type Struct with mapping of unique values to their count.

Examples

>>> df = pl.DataFrame(
...     {"color": ["red", "blue", "red", "green", "blue", "blue"]}
... )
>>> df.select(pl.col("color").value_counts())  
shape: (3, 1)
┌─────────────┐
│ color       │
│ ---         │
│ struct[2]   │
╞═════════════╡
│ {"red",2}   │
│ {"green",1} │
│ {"blue",3}  │
└─────────────┘

Sort the output by (descending) count and customize the count field name.

>>> df = df.select(pl.col("color").value_counts(sort=True, name="n"))
>>> df
shape: (3, 1)
┌─────────────┐
│ color       │
│ ---         │
│ struct[2]   │
╞═════════════╡
│ {"blue",3}  │
│ {"red",2}   │
│ {"green",1} │
└─────────────┘

>>> df.unnest("color")
shape: (3, 2)
┌───────┬─────┐
│ color ┆ n   │
│ ---   ┆ --- │
│ str   ┆ u32 │
╞═══════╪═════╡
│ blue  ┆ 3   │
│ red   ┆ 2   │
│ green ┆ 1   │
└───────┴─────┘