polars.Series.qcut#
- Series.qcut(
- quantiles: Sequence[float] | int,
- *,
- labels: Sequence[str] | None = None,
- left_closed: bool = False,
- allow_duplicates: bool = False,
- include_breaks: bool = False,
- break_point_label: str = 'break_point',
- category_label: str = 'category',
- as_series: Literal[True] = True,
- Series.qcut(
- quantiles: Sequence[float] | int,
- *,
- labels: Sequence[str] | None = None,
- left_closed: bool = False,
- allow_duplicates: bool = False,
- include_breaks: bool = False,
- break_point_label: str = 'break_point',
- category_label: str = 'category',
- as_series: Literal[False],
- Series.qcut(
- quantiles: Sequence[float] | int,
- *,
- labels: Sequence[str] | None = None,
- left_closed: bool = False,
- allow_duplicates: bool = False,
- include_breaks: bool = False,
- break_point_label: str = 'break_point',
- category_label: str = 'category',
- as_series: bool,
Bin continuous values into discrete categories based on their quantiles.
- Parameters:
- quantiles
Either a list of quantile probabilities between 0 and 1 or a positive integer determining the number of bins with uniform probability.
- labels
Names of the categories. The number of labels must be equal to the number of cut points plus one.
- left_closed
Set the intervals to be left-closed instead of right-closed.
- allow_duplicates
If set to
True
, duplicates in the resulting quantiles are dropped, rather than raising a DuplicateError. This can happen even with unique probabilities, depending on the data.- include_breaks
Include a column with the right endpoint of the bin each observation falls in. This will change the data type of the output from a
Categorical
to aStruct
.- break_point_label
Name of the breakpoint column. Only used if
include_breaks
is set toTrue
.Deprecated since version 0.19.0: This parameter will be removed. Use
Series.struct.rename_fields
to rename the field instead.- category_label
Name of the category column. Only used if
include_breaks
is set toTrue
.Deprecated since version 0.19.0: This parameter will be removed. Use
Series.struct.rename_fields
to rename the field instead.- as_series
If set to
False
, return a DataFrame containing the original values, the breakpoints, and the categories.Deprecated since version 0.19.0: This parameter will be removed. The same behavior can be achieved by setting
include_breaks=True
, unnesting the resulting struct Series, and adding the result to the original Series.
- Returns:
- Series
Series of data type
Categorical
ifinclude_breaks
is set toFalse
(default), otherwise a Series of data typeStruct
.
Warning
This functionality is experimental and may change without it being considered a breaking change.
See also
Examples
Divide a column into three categories according to pre-defined quantile probabilities.
>>> s = pl.Series("foo", [-2, -1, 0, 1, 2]) >>> s.qcut([0.25, 0.75], labels=["a", "b", "c"]) shape: (5,) Series: 'foo' [cat] [ "a" "a" "b" "b" "c" ]
Divide a column into two categories using uniform quantile probabilities.
>>> s.qcut(2, labels=["low", "high"], left_closed=True) shape: (5,) Series: 'foo' [cat] [ "low" "low" "high" "high" "high" ]
Create a DataFrame with the breakpoint and category for each value.
>>> cut = s.qcut([0.25, 0.75], include_breaks=True).alias("cut") >>> s.to_frame().with_columns(cut).unnest("cut") shape: (5, 3) ┌─────┬─────────────┬────────────┐ │ foo ┆ break_point ┆ category │ │ --- ┆ --- ┆ --- │ │ i64 ┆ f64 ┆ cat │ ╞═════╪═════════════╪════════════╡ │ -2 ┆ -1.0 ┆ (-inf, -1] │ │ -1 ┆ -1.0 ┆ (-inf, -1] │ │ 0 ┆ 1.0 ┆ (-1, 1] │ │ 1 ┆ 1.0 ┆ (-1, 1] │ │ 2 ┆ inf ┆ (1, inf] │ └─────┴─────────────┴────────────┘