polars.DataFrame.pivot#

DataFrame.pivot(
index: ColumnNameOrSelector | Sequence[ColumnNameOrSelector] | None,
columns: ColumnNameOrSelector | Sequence[ColumnNameOrSelector] | None,
values: ColumnNameOrSelector | Sequence[ColumnNameOrSelector] | None,
aggregate_function: PivotAgg | Expr | None = None,
*,
maintain_order: bool = True,
sort_columns: bool = False,
separator: str = '_',
) Self[source]#

Create a spreadsheet-style pivot table as a DataFrame.

Only available in eager mode. See “Examples” section below for how to do a “lazy pivot” if you know the unique column values in advance.

Parameters:
values

Column values to aggregate. If None, all remaining columns will be used.

index

One or multiple keys to group by.

columns

Name of the column(s) whose values will be used as the header of the output DataFrame.

aggregate_function

Choose from:

  • None: no aggregation takes place, will raise error if multiple values are in group.

  • A predefined aggregate function string, one of {‘min’, ‘max’, ‘first’, ‘last’, ‘sum’, ‘mean’, ‘median’, ‘len’}

  • An expression to do the aggregation.

maintain_order

Sort the grouped keys so that the output order is predictable.

sort_columns

Sort the transposed columns by name. Default is by order of discovery.

separator

Used as separator/delimiter in generated column names in case of multiple value columns.

Returns:
DataFrame

Examples

>>> df = pl.DataFrame(
...     {
...         "foo": ["one", "one", "two", "two", "one", "two"],
...         "bar": ["y", "y", "y", "x", "x", "x"],
...         "baz": [1, 2, 3, 4, 5, 6],
...     }
... )
>>> df.pivot(index="foo", columns="bar", values="baz", aggregate_function="sum")
shape: (2, 3)
┌─────┬─────┬─────┐
│ foo ┆ y   ┆ x   │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ one ┆ 3   ┆ 5   │
│ two ┆ 3   ┆ 10  │
└─────┴─────┴─────┘

Pivot using selectors to determine the index/values/columns:

>>> import polars.selectors as cs
>>> df.pivot(
...     index=cs.string(),
...     columns=cs.string(),
...     values=cs.numeric(),
...     aggregate_function="sum",
...     sort_columns=True,
... ).sort(
...     by=cs.string(),
... )
shape: (4, 6)
┌─────┬─────┬─────────────┬─────────────┬─────────────┬─────────────┐
│ foo ┆ bar ┆ {"one","x"} ┆ {"one","y"} ┆ {"two","x"} ┆ {"two","y"} │
│ --- ┆ --- ┆ ---         ┆ ---         ┆ ---         ┆ ---         │
│ str ┆ str ┆ i64         ┆ i64         ┆ i64         ┆ i64         │
╞═════╪═════╪═════════════╪═════════════╪═════════════╪═════════════╡
│ one ┆ x   ┆ 5           ┆ null        ┆ null        ┆ null        │
│ one ┆ y   ┆ null        ┆ 3           ┆ null        ┆ null        │
│ two ┆ x   ┆ null        ┆ null        ┆ 10          ┆ null        │
│ two ┆ y   ┆ null        ┆ null        ┆ null        ┆ 3           │
└─────┴─────┴─────────────┴─────────────┴─────────────┴─────────────┘

Run an expression as aggregation function

>>> df = pl.DataFrame(
...     {
...         "col1": ["a", "a", "a", "b", "b", "b"],
...         "col2": ["x", "x", "x", "x", "y", "y"],
...         "col3": [6, 7, 3, 2, 5, 7],
...     }
... )
>>> df.pivot(
...     index="col1",
...     columns="col2",
...     values="col3",
...     aggregate_function=pl.element().tanh().mean(),
... )
shape: (2, 3)
┌──────┬──────────┬──────────┐
│ col1 ┆ x        ┆ y        │
│ ---  ┆ ---      ┆ ---      │
│ str  ┆ f64      ┆ f64      │
╞══════╪══════════╪══════════╡
│ a    ┆ 0.998347 ┆ null     │
│ b    ┆ 0.964028 ┆ 0.999954 │
└──────┴──────────┴──────────┘

Note that pivot is only available in eager mode. If you know the unique column values in advance, you can use polars.LazyFrame.group_by() to get the same result as above in lazy mode:

>>> index = pl.col("col1")
>>> columns = pl.col("col2")
>>> values = pl.col("col3")
>>> unique_column_values = ["x", "y"]
>>> aggregate_function = lambda col: col.tanh().mean()
>>> df.lazy().group_by(index).agg(
...     aggregate_function(values.filter(columns == value)).alias(value)
...     for value in unique_column_values
... ).collect()  
shape: (2, 3)
┌──────┬──────────┬──────────┐
│ col1 ┆ x        ┆ y        │
│ ---  ┆ ---      ┆ ---      │
│ str  ┆ f64      ┆ f64      │
╞══════╪══════════╪══════════╡
│ a    ┆ 0.998347 ┆ null     │
│ b    ┆ 0.964028 ┆ 0.999954 │
└──────┴──────────┴──────────┘

Using a custom separator in generated column names:

>>> df = pl.DataFrame(
...     {
...         "ix": [1, 1, 2, 2, 1, 2],
...         "col": ["a", "a", "a", "a", "b", "b"],
...         "foo": [0, 1, 2, 2, 7, 1],
...         "bar": [0, 2, 0, 0, 9, 4],
...     }
... )
>>> df.pivot(
...     index="ix",
...     columns="col",
...     values=["foo", "bar"],
...     aggregate_function="sum",
...     separator="/",
... )
shape: (2, 5)
┌─────┬───────┬───────┬───────┬───────┐
│ ix  ┆ foo/a ┆ foo/b ┆ bar/a ┆ bar/b │
│ --- ┆ ---   ┆ ---   ┆ ---   ┆ ---   │
│ i64 ┆ i64   ┆ i64   ┆ i64   ┆ i64   │
╞═════╪═══════╪═══════╪═══════╪═══════╡
│ 1   ┆ 1     ┆ 7     ┆ 2     ┆ 9     │
│ 2   ┆ 4     ┆ 1     ┆ 0     ┆ 4     │
└─────┴───────┴───────┴───────┴───────┘