polars.Expr.value_counts#
- Expr.value_counts( ) Expr [source]#
Count the occurrence of unique values.
- Parameters:
- sort
Sort the output by count, in descending order. If set to
False
(default), the order is non-deterministic.- parallel
Execute the computation in parallel.
Note
This option should likely not be enabled in a
group_by
context, as the computation will already be parallelized per group.- name
Give the resulting count column a specific name; if
normalize
is True this defaults to “proportion”, otherwise defaults to “count”.- normalize
If True, the count is returned as the relative frequency of unique values normalized to 1.0.
- Returns:
- Expr
Expression of type
Struct
, mapping unique values to their count (or proportion).
Examples
>>> df = pl.DataFrame( ... {"color": ["red", "blue", "red", "green", "blue", "blue"]} ... ) >>> df_count = df.select(pl.col("color").value_counts()) >>> df_count shape: (3, 1) ┌─────────────┐ │ color │ │ --- │ │ struct[2] │ ╞═════════════╡ │ {"green",1} │ │ {"blue",3} │ │ {"red",2} │ └─────────────┘
>>> df_count.unnest("color") shape: (3, 2) ┌───────┬───────┐ │ color ┆ count │ │ --- ┆ --- │ │ str ┆ u32 │ ╞═══════╪═══════╡ │ green ┆ 1 │ │ blue ┆ 3 │ │ red ┆ 2 │ └───────┴───────┘
Sort the output by (descending) count, customize the field name, and normalize the count to its relative proportion (of 1.0).
>>> df_count = df.select( ... pl.col("color").value_counts( ... name="fraction", ... normalize=True, ... sort=True, ... ) ... ) >>> df_count shape: (3, 1) ┌────────────────────┐ │ color │ │ --- │ │ struct[2] │ ╞════════════════════╡ │ {"blue",0.5} │ │ {"red",0.333333} │ │ {"green",0.166667} │ └────────────────────┘
>>> df_count.unnest("color") shape: (3, 2) ┌───────┬──────────┐ │ color ┆ fraction │ │ --- ┆ --- │ │ str ┆ f64 │ ╞═══════╪══════════╡ │ blue ┆ 0.5 │ │ red ┆ 0.333333 │ │ green ┆ 0.166667 │ └───────┴──────────┘