polars.DataFrame.pivot#

DataFrame.pivot(
values: ColumnNameOrSelector | Sequence[ColumnNameOrSelector] | None,
index: ColumnNameOrSelector | Sequence[ColumnNameOrSelector] | None,
columns: ColumnNameOrSelector | Sequence[ColumnNameOrSelector] | None,
aggregate_function: PivotAgg | Expr | None | NoDefault = _NoDefault.no_default,
*,
maintain_order: bool = True,
sort_columns: bool = False,
separator: str = '_',
) Self[source]#

Create a spreadsheet-style pivot table as a DataFrame.

Parameters:
values

Column values to aggregate. Can be multiple columns if the columns arguments contains multiple columns as well.

index

One or multiple keys to group by.

columns

Name of the column(s) whose values will be used as the header of the output DataFrame.

aggregate_function

Choose from:

  • None: no aggregation takes place, will raise error if multiple values are in group.

  • A predefined aggregate function string, one of {‘first’, ‘sum’, ‘max’, ‘min’, ‘mean’, ‘median’, ‘last’, ‘count’}

  • An expression to do the aggregation.

maintain_order

Sort the grouped keys so that the output order is predictable.

sort_columns

Sort the transposed columns by name. Default is by order of discovery.

separator

Used as separator/delimiter in generated column names.

Returns:
DataFrame

Examples

>>> df = pl.DataFrame(
...     {
...         "foo": ["one", "one", "two", "two", "one", "two"],
...         "bar": ["y", "y", "y", "x", "x", "x"],
...         "baz": [1, 2, 3, 4, 5, 6],
...     }
... )
>>> df.pivot(values="baz", index="foo", columns="bar", aggregate_function="sum")
shape: (2, 3)
┌─────┬─────┬─────┐
│ foo ┆ y   ┆ x   │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ one ┆ 3   ┆ 5   │
│ two ┆ 3   ┆ 10  │
└─────┴─────┴─────┘

Pivot using selectors to determine the index/values/columns:

>>> import polars.selectors as cs
>>> df.pivot(
...     values=cs.numeric(),
...     index=cs.string(),
...     columns=cs.string(),
...     aggregate_function="sum",
...     sort_columns=True,
... ).sort(
...     by=cs.string(),
... )
shape: (4, 6)
┌─────┬─────┬──────┬──────┬──────┬──────┐
│ foo ┆ bar ┆ one  ┆ two  ┆ x    ┆ y    │
│ --- ┆ --- ┆ ---  ┆ ---  ┆ ---  ┆ ---  │
│ str ┆ str ┆ i64  ┆ i64  ┆ i64  ┆ i64  │
╞═════╪═════╪══════╪══════╪══════╪══════╡
│ one ┆ x   ┆ 5    ┆ null ┆ 5    ┆ null │
│ one ┆ y   ┆ 3    ┆ null ┆ null ┆ 3    │
│ two ┆ x   ┆ null ┆ 10   ┆ 10   ┆ null │
│ two ┆ y   ┆ null ┆ 3    ┆ null ┆ 3    │
└─────┴─────┴──────┴──────┴──────┴──────┘

Run an expression as aggregation function

>>> df = pl.DataFrame(
...     {
...         "col1": ["a", "a", "a", "b", "b", "b"],
...         "col2": ["x", "x", "x", "x", "y", "y"],
...         "col3": [6, 7, 3, 2, 5, 7],
...     }
... )
>>> df.pivot(
...     index="col1",
...     columns="col2",
...     values="col3",
...     aggregate_function=pl.element().tanh().mean(),
... )
shape: (2, 3)
┌──────┬──────────┬──────────┐
│ col1 ┆ x        ┆ y        │
│ ---  ┆ ---      ┆ ---      │
│ str  ┆ f64      ┆ f64      │
╞══════╪══════════╪══════════╡
│ a    ┆ 0.998347 ┆ null     │
│ b    ┆ 0.964028 ┆ 0.999954 │
└──────┴──────────┴──────────┘