polars.Series.qcut#

Series.qcut(
quantiles: Sequence[float] | int,
*,
labels: Sequence[str] | None = None,
left_closed: bool = False,
allow_duplicates: bool = False,
include_breaks: bool = False,
break_point_label: str = 'break_point',
category_label: str = 'category',
as_series: Literal[True] = True,
) Series[source]#
Series.qcut(
quantiles: Sequence[float] | int,
*,
labels: Sequence[str] | None = None,
left_closed: bool = False,
allow_duplicates: bool = False,
include_breaks: bool = False,
break_point_label: str = 'break_point',
category_label: str = 'category',
as_series: Literal[False],
) DataFrame
Series.qcut(
quantiles: Sequence[float] | int,
*,
labels: Sequence[str] | None = None,
left_closed: bool = False,
allow_duplicates: bool = False,
include_breaks: bool = False,
break_point_label: str = 'break_point',
category_label: str = 'category',
as_series: bool,
) Series | DataFrame

Bin continuous values into discrete categories based on their quantiles.

Parameters:
quantiles

Either a list of quantile probabilities between 0 and 1 or a positive integer determining the number of bins with uniform probability.

labels

Names of the categories. The number of labels must be equal to the number of cut points plus one.

left_closed

Set the intervals to be left-closed instead of right-closed.

allow_duplicates

If set to True, duplicates in the resulting quantiles are dropped, rather than raising a DuplicateError. This can happen even with unique probabilities, depending on the data.

include_breaks

Include a column with the right endpoint of the bin each observation falls in. This will change the data type of the output from a Categorical to a Struct.

break_point_label

Name of the breakpoint column. Only used if include_breaks is set to True.

Deprecated since version 0.19.0: This parameter will be removed. Use Series.struct.rename_fields to rename the field instead.

category_label

Name of the category column. Only used if include_breaks is set to True.

Deprecated since version 0.19.0: This parameter will be removed. Use Series.struct.rename_fields to rename the field instead.

as_series

If set to False, return a DataFrame containing the original values, the breakpoints, and the categories.

Deprecated since version 0.19.0: This parameter will be removed. The same behavior can be achieved by setting include_breaks=True, unnesting the resulting struct Series, and adding the result to the original Series.

Returns:
Series

Series of data type Categorical if include_breaks is set to False (default), otherwise a Series of data type Struct.

Warning

This functionality is experimental and may change without it being considered a breaking change.

See also

cut

Examples

Divide a column into three categories according to pre-defined quantile probabilities.

>>> s = pl.Series("foo", [-2, -1, 0, 1, 2])
>>> s.qcut([0.25, 0.75], labels=["a", "b", "c"])
shape: (5,)
Series: 'foo' [cat]
[
        "a"
        "a"
        "b"
        "b"
        "c"
]

Divide a column into two categories using uniform quantile probabilities.

>>> s.qcut(2, labels=["low", "high"], left_closed=True)
shape: (5,)
Series: 'foo' [cat]
[
        "low"
        "low"
        "high"
        "high"
        "high"
]

Create a DataFrame with the breakpoint and category for each value.

>>> cut = s.qcut([0.25, 0.75], include_breaks=True).alias("cut")
>>> s.to_frame().with_columns(cut).unnest("cut")
shape: (5, 3)
┌─────┬─────────────┬────────────┐
│ foo ┆ break_point ┆ category   │
│ --- ┆ ---         ┆ ---        │
│ i64 ┆ f64         ┆ cat        │
╞═════╪═════════════╪════════════╡
│ -2  ┆ -1.0        ┆ (-inf, -1] │
│ -1  ┆ -1.0        ┆ (-inf, -1] │
│ 0   ┆ 1.0         ┆ (-1, 1]    │
│ 1   ┆ 1.0         ┆ (-1, 1]    │
│ 2   ┆ inf         ┆ (1, inf]   │
└─────┴─────────────┴────────────┘