polars.align_frames#

polars.align_frames(
*frames: FrameType,
on: str | Expr | Sequence[str] | Sequence[Expr] | Sequence[str | Expr],
how: JoinStrategy = 'full',
select: str | Expr | Sequence[str | Expr] | None = None,
descending: bool | Sequence[bool] = False,
) list[FrameType][source]#

Align a sequence of frames using common values from one or more columns as a key.

Frames that do not contain the given key values have rows injected (with nulls filling the non-key columns), and each resulting frame is sorted by the key.

The original column order of input frames is not changed unless select is specified (in which case the final column order is determined from that). In the case where duplicate key values exist, the alignment behaviour is determined by the given alignment strategy specified in the how parameter (by default this is a full outer join, but if your data is suitable you can get a large speedup by setting how="left" instead).

Note that this function does not result in a joined frame - you receive the same number of frames back that you passed in, but each is now aligned by key and has the same number of rows.

Parameters:
frames

Sequence of DataFrames or LazyFrames.

on

One or more columns whose unique values will be used to align the frames.

select

Optional post-alignment column select to constrain and/or order the columns returned from the newly aligned frames.

descending

Sort the alignment column values in descending order; can be a single boolean or a list of booleans associated with each column in on.

how

By default the row alignment values are determined using a full outer join strategy across all frames; if you know that the first frame contains all required keys, you can set how="left" for a large performance increase.

Examples

>>> from datetime import date
>>> df1 = pl.DataFrame(
...     {
...         "dt": [date(2022, 9, 1), date(2022, 9, 2), date(2022, 9, 3)],
...         "x": [3.5, 4.0, 1.0],
...         "y": [10.0, 2.5, 1.5],
...     }
... )
>>> df2 = pl.DataFrame(
...     {
...         "dt": [date(2022, 9, 2), date(2022, 9, 3), date(2022, 9, 1)],
...         "x": [8.0, 1.0, 3.5],
...         "y": [1.5, 12.0, 5.0],
...     }
... )
>>> df3 = pl.DataFrame(
...     {
...         "dt": [date(2022, 9, 3), date(2022, 9, 2)],
...         "x": [2.0, 5.0],
...         "y": [2.5, 2.0],
...     }
... )
>>> pl.Config.set_tbl_formatting("UTF8_FULL")
#
# df1                              df2                              df3
# shape: (3, 3)                    shape: (3, 3)                    shape: (2, 3)
# ┌────────────┬─────┬──────┐      ┌────────────┬─────┬──────┐      ┌────────────┬─────┬─────┐
# │ dt         ┆ x   ┆ y    │      │ dt         ┆ x   ┆ y    │      │ dt         ┆ x   ┆ y   │
# │ ---        ┆ --- ┆ ---  │      │ ---        ┆ --- ┆ ---  │      │ ---        ┆ --- ┆ --- │
# │ date       ┆ f64 ┆ f64  │      │ date       ┆ f64 ┆ f64  │      │ date       ┆ f64 ┆ f64 │
# ╞════════════╪═════╪══════╡      ╞════════════╪═════╪══════╡      ╞════════════╪═════╪═════╡
# │ 2022-09-01 ┆ 3.5 ┆ 10.0 │\  ,->│ 2022-09-02 ┆ 8.0 ┆ 1.5  │\  ,->│ 2022-09-03 ┆ 2.0 ┆ 2.5 │
# ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┤ \/   ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┤ \/   ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
# │ 2022-09-02 ┆ 4.0 ┆ 2.5  │_/\,->│ 2022-09-03 ┆ 1.0 ┆ 12.0 │_/`-->│ 2022-09-02 ┆ 5.0 ┆ 2.0 │
# ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┤  /\  ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┤      └────────────┴─────┴─────┘
# │ 2022-09-03 ┆ 1.0 ┆ 1.5  │_/  `>│ 2022-09-01 ┆ 3.5 ┆ 5.0  │-//-
# └────────────┴─────┴──────┘      └────────────┴─────┴──────┘
...

Align frames by the “dt” column:

>>> af1, af2, af3 = pl.align_frames(
...     df1, df2, df3, on="dt"
... )
#
# df1                              df2                              df3
# shape: (3, 3)                    shape: (3, 3)                    shape: (3, 3)
# ┌────────────┬─────┬──────┐      ┌────────────┬─────┬──────┐      ┌────────────┬──────┬──────┐
# │ dt         ┆ x   ┆ y    │      │ dt         ┆ x   ┆ y    │      │ dt         ┆ x    ┆ y    │
# │ ---        ┆ --- ┆ ---  │      │ ---        ┆ --- ┆ ---  │      │ ---        ┆ ---  ┆ ---  │
# │ date       ┆ f64 ┆ f64  │      │ date       ┆ f64 ┆ f64  │      │ date       ┆ f64  ┆ f64  │
# ╞════════════╪═════╪══════╡      ╞════════════╪═════╪══════╡      ╞════════════╪══════╪══════╡
# │ 2022-09-01 ┆ 3.5 ┆ 10.0 │----->│ 2022-09-01 ┆ 3.5 ┆ 5.0  │----->│ 2022-09-01 ┆ null ┆ null │
# ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┤      ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┤      ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
# │ 2022-09-02 ┆ 4.0 ┆ 2.5  │----->│ 2022-09-02 ┆ 8.0 ┆ 1.5  │----->│ 2022-09-02 ┆ 5.0  ┆ 2.0  │
# ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┤      ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┤      ├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
# │ 2022-09-03 ┆ 1.0 ┆ 1.5  │----->│ 2022-09-03 ┆ 1.0 ┆ 12.0 │----->│ 2022-09-03 ┆ 2.0  ┆ 2.5  │
# └────────────┴─────┴──────┘      └────────────┴─────┴──────┘      └────────────┴──────┴──────┘
...

Align frames by “dt” using “left” alignment, but keep only cols “x” and “y”:

>>> af1, af2, af3 = pl.align_frames(
...     df1, df2, df3, on="dt", select=["x", "y"], how="left"
... )
#
# af1                 af2                 af3
# shape: (3, 3)       shape: (3, 3)       shape: (3, 3)
# ┌─────┬──────┐      ┌─────┬──────┐      ┌──────┬──────┐
# │ x   ┆ y    │      │ x   ┆ y    │      │ x    ┆ y    │
# │ --- ┆ ---  │      │ --- ┆ ---  │      │ ---  ┆ ---  │
# │ f64 ┆ f64  │      │ f64 ┆ f64  │      │ f64  ┆ f64  │
# ╞═════╪══════╡      ╞═════╪══════╡      ╞══════╪══════╡
# │ 3.5 ┆ 10.0 │      │ 3.5 ┆ 5.0  │      │ null ┆ null │
# ├╌╌╌╌╌┼╌╌╌╌╌╌┤      ├╌╌╌╌╌┼╌╌╌╌╌╌┤      ├╌╌╌╌╌╌┼╌╌╌╌╌╌┤
# │ 4.0 ┆ 2.5  │      │ 8.0 ┆ 1.5  │      │ 5.0  ┆ 2.0  │
# ├╌╌╌╌╌┼╌╌╌╌╌╌┤      ├╌╌╌╌╌┼╌╌╌╌╌╌┤      ├╌╌╌╌╌╌┼╌╌╌╌╌╌┤
# │ 1.0 ┆ 1.5  │      │ 1.0 ┆ 12.0 │      │ 2.0  ┆ 2.5  │
# └─────┴──────┘      └─────┴──────┘      └──────┴──────┘
...

Now data is aligned, and you can easily calculate the row-wise dot product:

>>> (af1 * af2 * af3).fill_null(0).select(pl.sum_horizontal("*").alias("dot"))
shape: (3, 1)
┌───────┐
│ dot   │
│ ---   │
│ f64   │
╞═══════╡
│ 0.0   │
├╌╌╌╌╌╌╌┤
│ 167.5 │
├╌╌╌╌╌╌╌┤
│ 47.0  │
└───────┘