polars.DataFrame.pivot#

Create a spreadsheet-style pivot table as a DataFrame.

Only available in eager mode. See “Examples” section below for how to do a “lazy pivot” if you know the unique column values in advance.

Changed in version 1.0.0: The columns parameter was renamed on.

Parameters:

on

The column(s) whose values will be used as the new columns of the output DataFrame.

on_columns

What value combinations will be considered for the output table.

index

The column(s) that remain from the input to the output. The output DataFrame will have one row for each unique combination of the index’s values. If None, all remaining columns not specified on on and values will be used. At least one of index and values must be specified.

values

The existing column(s) of values which will be moved under the new columns from index. If an aggregation is specified, these are the values on which the aggregation will be computed. If None, all remaining columns not specified on on and index will be used. At least one of index and values must be specified.

aggregate_function

Choose from:

None: no aggregation takes place, will raise error if multiple values are in group.
A predefined aggregate function string, one of {‘min’, ‘max’, ‘first’, ‘last’, ‘sum’, ‘mean’, ‘median’, ‘len’}
An expression to do the aggregation. The expression can only access data from the respective ‘values’ columns as generated by pivot, through pl.element().

maintain_order

Ensure the values of index are sorted by discovery order.

sort_columns

Sort the transposed columns by name. Default is by order of discovery.

separator

Used as separator/delimiter in generated column names in case of multiple values columns.

Returns:

DataFrame

See also

LazyFrame.pivot

Notes

In some other frameworks, you might know this operation as pivot_wider.

Examples

You can use pivot to reshape a dataframe from “long” to “wide” format.

For example, suppose we have a dataframe of test scores achieved by some students, where each row represents a distinct test.

>>> df = pl.DataFrame(
...     {
...         "name": ["Cady", "Cady", "Karen", "Karen"],
...         "subject": ["maths", "physics", "maths", "physics"],
...         "test_1": [98, 99, 61, 58],
...         "test_2": [100, 100, 60, 60],
...     }
... )
>>> df
shape: (4, 4)
┌───────┬─────────┬────────┬────────┐
│ name  ┆ subject ┆ test_1 ┆ test_2 │
│ ---   ┆ ---     ┆ ---    ┆ ---    │
│ str   ┆ str     ┆ i64    ┆ i64    │
╞═══════╪═════════╪════════╪════════╡
│ Cady  ┆ maths   ┆ 98     ┆ 100    │
│ Cady  ┆ physics ┆ 99     ┆ 100    │
│ Karen ┆ maths   ┆ 61     ┆ 60     │
│ Karen ┆ physics ┆ 58     ┆ 60     │
└───────┴─────────┴────────┴────────┘

Using pivot, we can reshape so we have one row per student, with different subjects as columns, and their test_1 scores as values:

>>> df.pivot("subject", index="name", values="test_1")
shape: (2, 3)
┌───────┬───────┬─────────┐
│ name  ┆ maths ┆ physics │
│ ---   ┆ ---   ┆ ---     │
│ str   ┆ i64   ┆ i64     │
╞═══════╪═══════╪═════════╡
│ Cady  ┆ 98    ┆ 99      │
│ Karen ┆ 61    ┆ 58      │
└───────┴───────┴─────────┘

If you want to only pivot over a limited set of subject values or already know the subject values ahead of time, you can provide these using the on_columns argument.

>>> df.pivot(
...     "subject",
...     on_columns=["maths", "physics"],
...     index="name",
...     values="test_1",
... )
shape: (2, 3)
┌───────┬───────┬─────────┐
│ name  ┆ maths ┆ physics │
│ ---   ┆ ---   ┆ ---     │
│ str   ┆ i64   ┆ i64     │
╞═══════╪═══════╪═════════╡
│ Cady  ┆ 98    ┆ 99      │
│ Karen ┆ 61    ┆ 58      │
└───────┴───────┴─────────┘

You can use selectors too - here we include all test scores in the pivoted table:

>>> import polars.selectors as cs
>>> df.pivot("subject", values=cs.starts_with("test"))
shape: (2, 5)
┌───────┬──────────────┬────────────────┬──────────────┬────────────────┐
│ name  ┆ test_1_maths ┆ test_1_physics ┆ test_2_maths ┆ test_2_physics │
│ ---   ┆ ---          ┆ ---            ┆ ---          ┆ ---            │
│ str   ┆ i64          ┆ i64            ┆ i64          ┆ i64            │
╞═══════╪══════════════╪════════════════╪══════════════╪════════════════╡
│ Cady  ┆ 98           ┆ 99             ┆ 100          ┆ 100            │
│ Karen ┆ 61           ┆ 58             ┆ 60           ┆ 60             │
└───────┴──────────────┴────────────────┴──────────────┴────────────────┘

If you end up with multiple values per cell, you can specify how to aggregate them with aggregate_function:

>>> df = pl.DataFrame(
...     {
...         "ix": [1, 1, 2, 2, 1, 2],
...         "col": ["a", "a", "a", "a", "b", "b"],
...         "foo": [0, 1, 2, 2, 7, 1],
...         "bar": [0, 2, 0, 0, 9, 4],
...     }
... )
>>> df.pivot("col", index="ix", aggregate_function="sum")
shape: (2, 5)
┌─────┬───────┬───────┬───────┬───────┐
│ ix  ┆ foo_a ┆ foo_b ┆ bar_a ┆ bar_b │
│ --- ┆ ---   ┆ ---   ┆ ---   ┆ ---   │
│ i64 ┆ i64   ┆ i64   ┆ i64   ┆ i64   │
╞═════╪═══════╪═══════╪═══════╪═══════╡
│ 1   ┆ 1     ┆ 7     ┆ 2     ┆ 9     │
│ 2   ┆ 4     ┆ 1     ┆ 0     ┆ 4     │
└─────┴───────┴───────┴───────┴───────┘

You can also pass a custom aggregation function using polars.element():

>>> df = pl.DataFrame(
...     {
...         "col1": ["a", "a", "a", "b", "b", "b"],
...         "col2": ["x", "x", "x", "x", "y", "y"],
...         "col3": [6, 7, 3, 2, 5, 7],
...     }
... )
>>> df.pivot(
...     "col2",
...     index="col1",
...     values="col3",
...     aggregate_function=pl.element().tanh().mean(),
... )
shape: (2, 3)
┌──────┬──────────┬──────────┐
│ col1 ┆ x        ┆ y        │
│ ---  ┆ ---      ┆ ---      │
│ str  ┆ f64      ┆ f64      │
╞══════╪══════════╪══════════╡
│ a    ┆ 0.998347 ┆ null     │
│ b    ┆ 0.964028 ┆ 0.999954 │
└──────┴──────────┴──────────┘