polars.Expr.value_counts#

Expr.value_counts( *, sort: bool = False, parallel: bool = False, name: str_ | None = None, normalize: bool = False, ) → Expr[source]#

Count the occurrence of unique values.

Parameters:

sort: Sort the output by count, in descending order. If set to False (default), the order is non-deterministic.
parallel: Execute the computation in parallel.

Note

This option should likely not be enabled in a group_by context, as the computation will already be parallelized per group.
name: Give the resulting count column a specific name; if normalize is True this defaults to “proportion”, otherwise defaults to “count”.
normalize: If True, the count is returned as the relative frequency of unique values normalized to 1.0.

Returns:

Expr: Expression of type Struct, mapping unique values to their count (or proportion).

Examples

>>> df = pl.DataFrame(
...     {"color": ["red", "blue", "red", "green", "blue", "blue"]}
... )
>>> df_count = df.select(pl.col("color").value_counts())
>>> df_count  
shape: (3, 1)
┌─────────────┐
│ color       │
│ ---         │
│ struct[2]   │
╞═════════════╡
│ {"green",1} │
│ {"blue",3}  │
│ {"red",2}   │
└─────────────┘

>>> df_count.unnest("color")  
shape: (3, 2)
┌───────┬───────┐
│ color ┆ count │
│ ---   ┆ ---   │
│ str   ┆ u32   │
╞═══════╪═══════╡
│ green ┆ 1     │
│ blue  ┆ 3     │
│ red   ┆ 2     │
└───────┴───────┘

Sort the output by (descending) count, customize the field name, and normalize the count to its relative proportion (of 1.0).

>>> df_count = df.select(
...     pl.col("color").value_counts(
...         name="fraction",
...         normalize=True,
...         sort=True,
...     )
... )
>>> df_count
shape: (3, 1)
┌────────────────────┐
│ color              │
│ ---                │
│ struct[2]          │
╞════════════════════╡
│ {"blue",0.5}       │
│ {"red",0.333333}   │
│ {"green",0.166667} │
└────────────────────┘

>>> df_count.unnest("color")
shape: (3, 2)
┌───────┬──────────┐
│ color ┆ fraction │
│ ---   ┆ ---      │
│ str   ┆ f64      │
╞═══════╪══════════╡
│ blue  ┆ 0.5      │
│ red   ┆ 0.333333 │
│ green ┆ 0.166667 │
└───────┴──────────┘

Note that group_by can be used to generate counts.

>>> df.group_by("color").len()  
shape: (3, 2)
┌───────┬─────┐
│ color ┆ len │
│ ---   ┆ --- │
│ str   ┆ u32 │
╞═══════╪═════╡
│ red   ┆ 2   │
│ green ┆ 1   │
│ blue  ┆ 3   │
└───────┴─────┘

To add counts as a new column pl.len() can be used as a window function.

>>> df.with_columns(pl.len().over("color"))
shape: (6, 2)
┌───────┬─────┐
│ color ┆ len │
│ ---   ┆ --- │
│ str   ┆ u32 │
╞═══════╪═════╡
│ red   ┆ 2   │
│ blue  ┆ 3   │
│ red   ┆ 2   │
│ green ┆ 1   │
│ blue  ┆ 3   │
│ blue  ┆ 3   │
└───────┴─────┘

>>> df.with_columns((pl.len().over("color") / pl.len()).alias("fraction"))
shape: (6, 2)
┌───────┬──────────┐
│ color ┆ fraction │
│ ---   ┆ ---      │
│ str   ┆ f64      │
╞═══════╪══════════╡
│ red   ┆ 0.333333 │
│ blue  ┆ 0.5      │
│ red   ┆ 0.333333 │
│ green ┆ 0.166667 │
│ blue  ┆ 0.5      │
│ blue  ┆ 0.5      │
└───────┴──────────┘