polars.DataFrame.pivot#
- DataFrame.pivot(
- index: ColumnNameOrSelector | Sequence[ColumnNameOrSelector] | None,
- columns: ColumnNameOrSelector | Sequence[ColumnNameOrSelector] | None,
- values: ColumnNameOrSelector | Sequence[ColumnNameOrSelector] | None,
- aggregate_function: PivotAgg | Expr | None = None,
- *,
- maintain_order: bool = True,
- sort_columns: bool = False,
- separator: str = '_',
Create a spreadsheet-style pivot table as a DataFrame.
Only available in eager mode. See “Examples” section below for how to do a “lazy pivot” if you know the unique column values in advance.
- Parameters:
- values
Column values to aggregate. If None, all remaining columns will be used.
- index
One or multiple keys to group by.
- columns
Name of the column(s) whose values will be used as the header of the output DataFrame.
- aggregate_function
Choose from:
None: no aggregation takes place, will raise error if multiple values are in group.
A predefined aggregate function string, one of {‘min’, ‘max’, ‘first’, ‘last’, ‘sum’, ‘mean’, ‘median’, ‘len’}
An expression to do the aggregation.
- maintain_order
Sort the grouped keys so that the output order is predictable.
- sort_columns
Sort the transposed columns by name. Default is by order of discovery.
- separator
Used as separator/delimiter in generated column names in case of multiple value columns.
- Returns:
- DataFrame
Examples
>>> df = pl.DataFrame( ... { ... "foo": ["one", "one", "two", "two", "one", "two"], ... "bar": ["y", "y", "y", "x", "x", "x"], ... "baz": [1, 2, 3, 4, 5, 6], ... } ... ) >>> df.pivot(index="foo", columns="bar", values="baz", aggregate_function="sum") shape: (2, 3) ┌─────┬─────┬─────┐ │ foo ┆ y ┆ x │ │ --- ┆ --- ┆ --- │ │ str ┆ i64 ┆ i64 │ ╞═════╪═════╪═════╡ │ one ┆ 3 ┆ 5 │ │ two ┆ 3 ┆ 10 │ └─────┴─────┴─────┘
Pivot using selectors to determine the index/values/columns:
>>> import polars.selectors as cs >>> df.pivot( ... index=cs.string(), ... columns=cs.string(), ... values=cs.numeric(), ... aggregate_function="sum", ... sort_columns=True, ... ).sort( ... by=cs.string(), ... ) shape: (4, 6) ┌─────┬─────┬─────────────┬─────────────┬─────────────┬─────────────┐ │ foo ┆ bar ┆ {"one","x"} ┆ {"one","y"} ┆ {"two","x"} ┆ {"two","y"} │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ str ┆ i64 ┆ i64 ┆ i64 ┆ i64 │ ╞═════╪═════╪═════════════╪═════════════╪═════════════╪═════════════╡ │ one ┆ x ┆ 5 ┆ null ┆ null ┆ null │ │ one ┆ y ┆ null ┆ 3 ┆ null ┆ null │ │ two ┆ x ┆ null ┆ null ┆ 10 ┆ null │ │ two ┆ y ┆ null ┆ null ┆ null ┆ 3 │ └─────┴─────┴─────────────┴─────────────┴─────────────┴─────────────┘
Run an expression as aggregation function
>>> df = pl.DataFrame( ... { ... "col1": ["a", "a", "a", "b", "b", "b"], ... "col2": ["x", "x", "x", "x", "y", "y"], ... "col3": [6, 7, 3, 2, 5, 7], ... } ... ) >>> df.pivot( ... index="col1", ... columns="col2", ... values="col3", ... aggregate_function=pl.element().tanh().mean(), ... ) shape: (2, 3) ┌──────┬──────────┬──────────┐ │ col1 ┆ x ┆ y │ │ --- ┆ --- ┆ --- │ │ str ┆ f64 ┆ f64 │ ╞══════╪══════════╪══════════╡ │ a ┆ 0.998347 ┆ null │ │ b ┆ 0.964028 ┆ 0.999954 │ └──────┴──────────┴──────────┘
Note that
pivot
is only available in eager mode. If you know the unique column values in advance, you can usepolars.LazyFrame.group_by()
to get the same result as above in lazy mode:>>> index = pl.col("col1") >>> columns = pl.col("col2") >>> values = pl.col("col3") >>> unique_column_values = ["x", "y"] >>> aggregate_function = lambda col: col.tanh().mean() >>> df.lazy().group_by(index).agg( ... aggregate_function(values.filter(columns == value)).alias(value) ... for value in unique_column_values ... ).collect() shape: (2, 3) ┌──────┬──────────┬──────────┐ │ col1 ┆ x ┆ y │ │ --- ┆ --- ┆ --- │ │ str ┆ f64 ┆ f64 │ ╞══════╪══════════╪══════════╡ │ a ┆ 0.998347 ┆ null │ │ b ┆ 0.964028 ┆ 0.999954 │ └──────┴──────────┴──────────┘
Using a custom
separator
in generated column names:>>> df = pl.DataFrame( ... { ... "ix": [1, 1, 2, 2, 1, 2], ... "col": ["a", "a", "a", "a", "b", "b"], ... "foo": [0, 1, 2, 2, 7, 1], ... "bar": [0, 2, 0, 0, 9, 4], ... } ... ) >>> df.pivot( ... index="ix", ... columns="col", ... values=["foo", "bar"], ... aggregate_function="sum", ... separator="/", ... ) shape: (2, 5) ┌─────┬───────┬───────┬───────┬───────┐ │ ix ┆ foo/a ┆ foo/b ┆ bar/a ┆ bar/b │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 │ ╞═════╪═══════╪═══════╪═══════╪═══════╡ │ 1 ┆ 1 ┆ 7 ┆ 2 ┆ 9 │ │ 2 ┆ 4 ┆ 1 ┆ 0 ┆ 4 │ └─────┴───────┴───────┴───────┴───────┘