polars.DataFrame.pivot#
- DataFrame.pivot(
- *,
- values: ColumnNameOrSelector | Sequence[ColumnNameOrSelector] | None,
- index: ColumnNameOrSelector | Sequence[ColumnNameOrSelector] | None,
- columns: ColumnNameOrSelector | Sequence[ColumnNameOrSelector] | None,
- aggregate_function: PivotAgg | Expr | None = None,
- maintain_order: bool = True,
- sort_columns: bool = False,
- separator: str = '_',
Create a spreadsheet-style pivot table as a DataFrame.
Only available in eager mode. See “Examples” section below for how to do a “lazy pivot” if you know the unique column values in advance.
- Parameters:
- values
Column values to aggregate. If None, all remaining columns will be used.
- index
One or multiple keys to group by.
- columns
Name of the column(s) whose values will be used as the header of the output DataFrame.
- aggregate_function
Choose from:
None: no aggregation takes place, will raise error if multiple values are in group.
A predefined aggregate function string, one of {‘min’, ‘max’, ‘first’, ‘last’, ‘sum’, ‘mean’, ‘median’, ‘len’}
An expression to do the aggregation.
- maintain_order
Sort the grouped keys so that the output order is predictable.
- sort_columns
Sort the transposed columns by name. Default is by order of discovery.
- separator
Used as separator/delimiter in generated column names in case of multiple value columns.
- Returns:
- DataFrame
Examples
>>> df = pl.DataFrame( ... { ... "foo": ["one", "one", "two", "two", "one", "two"], ... "bar": ["y", "y", "y", "x", "x", "x"], ... "baz": [1, 2, 3, 4, 5, 6], ... } ... ) >>> df.pivot(index="foo", columns="bar", values="baz", aggregate_function="sum") shape: (2, 3) ┌─────┬─────┬─────┐ │ foo ┆ y ┆ x │ │ --- ┆ --- ┆ --- │ │ str ┆ i64 ┆ i64 │ ╞═════╪═════╪═════╡ │ one ┆ 3 ┆ 5 │ │ two ┆ 3 ┆ 10 │ └─────┴─────┴─────┘
Pivot using selectors to determine the index/values/columns:
>>> import polars.selectors as cs >>> df.pivot( ... index=cs.string(), ... columns=cs.string(), ... values=cs.numeric(), ... aggregate_function="sum", ... sort_columns=True, ... ).sort( ... by=cs.string(), ... ) shape: (4, 6) ┌─────┬─────┬─────────────┬─────────────┬─────────────┬─────────────┐ │ foo ┆ bar ┆ {"one","x"} ┆ {"one","y"} ┆ {"two","x"} ┆ {"two","y"} │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ str ┆ i64 ┆ i64 ┆ i64 ┆ i64 │ ╞═════╪═════╪═════════════╪═════════════╪═════════════╪═════════════╡ │ one ┆ x ┆ 5 ┆ null ┆ null ┆ null │ │ one ┆ y ┆ null ┆ 3 ┆ null ┆ null │ │ two ┆ x ┆ null ┆ null ┆ 10 ┆ null │ │ two ┆ y ┆ null ┆ null ┆ null ┆ 3 │ └─────┴─────┴─────────────┴─────────────┴─────────────┴─────────────┘
Run an expression as aggregation function
>>> df = pl.DataFrame( ... { ... "col1": ["a", "a", "a", "b", "b", "b"], ... "col2": ["x", "x", "x", "x", "y", "y"], ... "col3": [6, 7, 3, 2, 5, 7], ... } ... ) >>> df.pivot( ... index="col1", ... columns="col2", ... values="col3", ... aggregate_function=pl.element().tanh().mean(), ... ) shape: (2, 3) ┌──────┬──────────┬──────────┐ │ col1 ┆ x ┆ y │ │ --- ┆ --- ┆ --- │ │ str ┆ f64 ┆ f64 │ ╞══════╪══════════╪══════════╡ │ a ┆ 0.998347 ┆ null │ │ b ┆ 0.964028 ┆ 0.999954 │ └──────┴──────────┴──────────┘
Note that
pivot
is only available in eager mode. If you know the unique column values in advance, you can usepolars.LazyFrame.groupby()
to get the same result as above in lazy mode:>>> index = pl.col("col1") >>> columns = pl.col("col2") >>> values = pl.col("col3") >>> unique_column_values = ["x", "y"] >>> aggregate_function = lambda col: col.tanh().mean() >>> df.lazy().group_by(index).agg( ... aggregate_function(values.filter(columns == value)).alias(value) ... for value in unique_column_values ... ).collect() shape: (2, 3) ┌──────┬──────────┬──────────┐ │ col1 ┆ x ┆ y │ │ --- ┆ --- ┆ --- │ │ str ┆ f64 ┆ f64 │ ╞══════╪══════════╪══════════╡ │ a ┆ 0.998347 ┆ null │ │ b ┆ 0.964028 ┆ 0.999954 │ └──────┴──────────┴──────────┘
Using a custom
separator
in generated column names:>>> df = pl.DataFrame( ... { ... "ix": [1, 1, 2, 2, 1, 2], ... "col": ["a", "a", "a", "a", "b", "b"], ... "foo": [0, 1, 2, 2, 7, 1], ... "bar": [0, 2, 0, 0, 9, 4], ... } ... ) >>> df.pivot( ... index="ix", ... columns="col", ... values=["foo", "bar"], ... aggregate_function="sum", ... separator="/", ... ) shape: (2, 5) ┌─────┬───────────┬───────────┬───────────┬───────────┐ │ ix ┆ foo/col/a ┆ foo/col/b ┆ bar/col/a ┆ bar/col/b │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 │ ╞═════╪═══════════╪═══════════╪═══════════╪═══════════╡ │ 1 ┆ 1 ┆ 7 ┆ 2 ┆ 9 │ │ 2 ┆ 4 ┆ 1 ┆ 0 ┆ 4 │ └─────┴───────────┴───────────┴───────────┴───────────┘