polars.Expr.value_counts#
- Expr.value_counts( ) Expr[source]#
Count the occurrence of unique values.
- Parameters:
- sort
Sort the output by count, in descending order. If set to
False(default), the order is non-deterministic.- parallel
Execute the computation in parallel.
Note
This option should likely not be enabled in a
group_bycontext, as the computation will already be parallelized per group.- name
Give the resulting count column a specific name; if
normalizeis True this defaults to “proportion”, otherwise defaults to “count”.- normalize
If True, the count is returned as the relative frequency of unique values normalized to 1.0.
- Returns:
- Expr
Expression of type
Struct, mapping unique values to their count (or proportion).
Examples
>>> df = pl.DataFrame( ... {"color": ["red", "blue", "red", "green", "blue", "blue"]} ... ) >>> df_count = df.select(pl.col("color").value_counts()) >>> df_count shape: (3, 1) ┌─────────────┐ │ color │ │ --- │ │ struct[2] │ ╞═════════════╡ │ {"green",1} │ │ {"blue",3} │ │ {"red",2} │ └─────────────┘
>>> df_count.unnest("color") shape: (3, 2) ┌───────┬───────┐ │ color ┆ count │ │ --- ┆ --- │ │ str ┆ u32 │ ╞═══════╪═══════╡ │ green ┆ 1 │ │ blue ┆ 3 │ │ red ┆ 2 │ └───────┴───────┘
Sort the output by (descending) count, customize the field name, and normalize the count to its relative proportion (of 1.0).
>>> df_count = df.select( ... pl.col("color").value_counts( ... name="fraction", ... normalize=True, ... sort=True, ... ) ... ) >>> df_count shape: (3, 1) ┌────────────────────┐ │ color │ │ --- │ │ struct[2] │ ╞════════════════════╡ │ {"blue",0.5} │ │ {"red",0.333333} │ │ {"green",0.166667} │ └────────────────────┘
>>> df_count.unnest("color") shape: (3, 2) ┌───────┬──────────┐ │ color ┆ fraction │ │ --- ┆ --- │ │ str ┆ f64 │ ╞═══════╪══════════╡ │ blue ┆ 0.5 │ │ red ┆ 0.333333 │ │ green ┆ 0.166667 │ └───────┴──────────┘
Note that
group_bycan be used to generate counts.>>> df.group_by("color").len() shape: (3, 2) ┌───────┬─────┐ │ color ┆ len │ │ --- ┆ --- │ │ str ┆ u32 │ ╞═══════╪═════╡ │ red ┆ 2 │ │ green ┆ 1 │ │ blue ┆ 3 │ └───────┴─────┘
To add counts as a new column
pl.len()can be used as a window function.>>> df.with_columns(pl.len().over("color")) shape: (6, 2) ┌───────┬─────┐ │ color ┆ len │ │ --- ┆ --- │ │ str ┆ u32 │ ╞═══════╪═════╡ │ red ┆ 2 │ │ blue ┆ 3 │ │ red ┆ 2 │ │ green ┆ 1 │ │ blue ┆ 3 │ │ blue ┆ 3 │ └───────┴─────┘
>>> df.with_columns((pl.len().over("color") / pl.len()).alias("fraction")) shape: (6, 2) ┌───────┬──────────┐ │ color ┆ fraction │ │ --- ┆ --- │ │ str ┆ f64 │ ╞═══════╪══════════╡ │ red ┆ 0.333333 │ │ blue ┆ 0.5 │ │ red ┆ 0.333333 │ │ green ┆ 0.166667 │ │ blue ┆ 0.5 │ │ blue ┆ 0.5 │ └───────┴──────────┘