polars.DataFrame.pivot#
- DataFrame.pivot(
- on: ColumnNameOrSelector | Sequence[ColumnNameOrSelector],
- *,
- index: ColumnNameOrSelector | Sequence[ColumnNameOrSelector] | None = None,
- values: ColumnNameOrSelector | Sequence[ColumnNameOrSelector] | None = None,
- aggregate_function: PivotAgg | Expr | None = None,
- maintain_order: bool = True,
- sort_columns: bool = False,
- separator: str = '_',
Create a spreadsheet-style pivot table as a DataFrame.
Only available in eager mode. See “Examples” section below for how to do a “lazy pivot” if you know the unique column values in advance.
Changed in version 1.0.0: The
columnsparameter was renamedon.- Parameters:
- on
The column(s) whose values will be used as the new columns of the output DataFrame.
- index
The column(s) that remain from the input to the output. The output DataFrame will have one row for each unique combination of the
index’s values. If None, all remaining columns not specified ononandvalueswill be used. At least one ofindexandvaluesmust be specified.- values
The existing column(s) of values which will be moved under the new columns from index. If an aggregation is specified, these are the values on which the aggregation will be computed. If None, all remaining columns not specified on
onandindexwill be used. At least one ofindexandvaluesmust be specified.- aggregate_function
Choose from:
None: no aggregation takes place, will raise error if multiple values are in group.
A predefined aggregate function string, one of {‘min’, ‘max’, ‘first’, ‘last’, ‘sum’, ‘mean’, ‘median’, ‘len’}
An expression to do the aggregation. The expression can only access data from the respective ‘values’ columns as generated by pivot, through
pl.element().
- maintain_order
Ensure the values of
indexare sorted by discovery order.- sort_columns
Sort the transposed columns by name. Default is by order of discovery.
- separator
Used as separator/delimiter in generated column names in case of multiple
valuescolumns.
- Returns:
- DataFrame
Notes
In some other frameworks, you might know this operation as
pivot_wider.Examples
You can use
pivotto reshape a dataframe from “long” to “wide” format.For example, suppose we have a dataframe of test scores achieved by some students, where each row represents a distinct test.
>>> df = pl.DataFrame( ... { ... "name": ["Cady", "Cady", "Karen", "Karen"], ... "subject": ["maths", "physics", "maths", "physics"], ... "test_1": [98, 99, 61, 58], ... "test_2": [100, 100, 60, 60], ... } ... ) >>> df shape: (4, 4) ┌───────┬─────────┬────────┬────────┐ │ name ┆ subject ┆ test_1 ┆ test_2 │ │ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ str ┆ i64 ┆ i64 │ ╞═══════╪═════════╪════════╪════════╡ │ Cady ┆ maths ┆ 98 ┆ 100 │ │ Cady ┆ physics ┆ 99 ┆ 100 │ │ Karen ┆ maths ┆ 61 ┆ 60 │ │ Karen ┆ physics ┆ 58 ┆ 60 │ └───────┴─────────┴────────┴────────┘
Using
pivot, we can reshape so we have one row per student, with different subjects as columns, and theirtest_1scores as values:>>> df.pivot("subject", index="name", values="test_1") shape: (2, 3) ┌───────┬───────┬─────────┐ │ name ┆ maths ┆ physics │ │ --- ┆ --- ┆ --- │ │ str ┆ i64 ┆ i64 │ ╞═══════╪═══════╪═════════╡ │ Cady ┆ 98 ┆ 99 │ │ Karen ┆ 61 ┆ 58 │ └───────┴───────┴─────────┘
You can use selectors too - here we include all test scores in the pivoted table:
>>> import polars.selectors as cs >>> df.pivot("subject", values=cs.starts_with("test")) shape: (2, 5) ┌───────┬──────────────┬────────────────┬──────────────┬────────────────┐ │ name ┆ test_1_maths ┆ test_1_physics ┆ test_2_maths ┆ test_2_physics │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ i64 ┆ i64 ┆ i64 ┆ i64 │ ╞═══════╪══════════════╪════════════════╪══════════════╪════════════════╡ │ Cady ┆ 98 ┆ 99 ┆ 100 ┆ 100 │ │ Karen ┆ 61 ┆ 58 ┆ 60 ┆ 60 │ └───────┴──────────────┴────────────────┴──────────────┴────────────────┘
If you end up with multiple values per cell, you can specify how to aggregate them with
aggregate_function:>>> df = pl.DataFrame( ... { ... "ix": [1, 1, 2, 2, 1, 2], ... "col": ["a", "a", "a", "a", "b", "b"], ... "foo": [0, 1, 2, 2, 7, 1], ... "bar": [0, 2, 0, 0, 9, 4], ... } ... ) >>> df.pivot("col", index="ix", aggregate_function="sum") shape: (2, 5) ┌─────┬───────┬───────┬───────┬───────┐ │ ix ┆ foo_a ┆ foo_b ┆ bar_a ┆ bar_b │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 │ ╞═════╪═══════╪═══════╪═══════╪═══════╡ │ 1 ┆ 1 ┆ 7 ┆ 2 ┆ 9 │ │ 2 ┆ 4 ┆ 1 ┆ 0 ┆ 4 │ └─────┴───────┴───────┴───────┴───────┘
You can also pass a custom aggregation function using
polars.element():>>> df = pl.DataFrame( ... { ... "col1": ["a", "a", "a", "b", "b", "b"], ... "col2": ["x", "x", "x", "x", "y", "y"], ... "col3": [6, 7, 3, 2, 5, 7], ... } ... ) >>> df.pivot( ... "col2", ... index="col1", ... values="col3", ... aggregate_function=pl.element().tanh().mean(), ... ) shape: (2, 3) ┌──────┬──────────┬──────────┐ │ col1 ┆ x ┆ y │ │ --- ┆ --- ┆ --- │ │ str ┆ f64 ┆ f64 │ ╞══════╪══════════╪══════════╡ │ a ┆ 0.998347 ┆ null │ │ b ┆ 0.964028 ┆ 0.999954 │ └──────┴──────────┴──────────┘
Note that
pivotis only available in eager mode. If you know the unique column values in advance, you can usepolars.LazyFrame.group_by()to get the same result as above in lazy mode:>>> index = pl.col("col1") >>> on = pl.col("col2") >>> values = pl.col("col3") >>> unique_column_values = ["x", "y"] >>> aggregate_function = lambda col: col.tanh().mean() >>> df.lazy().group_by(index).agg( ... aggregate_function(values.filter(on == value)).alias(value) ... for value in unique_column_values ... ).collect() shape: (2, 3) ┌──────┬──────────┬──────────┐ │ col1 ┆ x ┆ y │ │ --- ┆ --- ┆ --- │ │ str ┆ f64 ┆ f64 │ ╞══════╪══════════╪══════════╡ │ a ┆ 0.998347 ┆ null │ │ b ┆ 0.964028 ┆ 0.999954 │ └──────┴──────────┴──────────┘