polars.DataFrame.join#

Join in SQL-like fashion.

Parameters:

other

DataFrame to join with.

on

Name(s) of the join columns in both DataFrames.

how{‘inner’, ‘left’, ‘full’, ‘semi’, ‘anti’, ‘cross’}

Join strategy.

inner
Returns rows that have matching values in both tables
left
Returns all rows from the left table, and the matched rows from the right table
full
Returns all rows when there is a match in either left or right table
cross
Returns the Cartesian product of rows from both tables
semi
Filter rows that have a match in the right table.
anti
Filter rows that do not have a match in the right table.

Note

A left join preserves the row order of the left DataFrame.

left_on

Name(s) of the left join column(s).

right_on

Name(s) of the right join column(s).

suffix

Suffix to append to columns with a duplicate name.

validate: {‘m:m’, ‘m:1’, ‘1:m’, ‘1:1’}

Checks if join is of specified type.

many_to_many
“m:m”: default, does not result in checks

one_to_one
“1:1”: check if join keys are unique in both left and right datasets

one_to_many
“1:m”: check if join keys are unique in left dataset

many_to_one
“m:1”: check if join keys are unique in right dataset

Note

This is currently not supported by the streaming engine.

join_nulls

Join on null values. By default null values will never produce matches.

coalesce

Coalescing behavior (merging of join columns).

None: -> join specific.
True: -> Always coalesce join columns.
False: -> Never coalesce join columns.

Returns:

DataFrame

See also

join_asof

Notes

For joining on columns with categorical data, see polars.StringCache.

Examples

>>> df = pl.DataFrame(
...     {
...         "foo": [1, 2, 3],
...         "bar": [6.0, 7.0, 8.0],
...         "ham": ["a", "b", "c"],
...     }
... )
>>> other_df = pl.DataFrame(
...     {
...         "apple": ["x", "y", "z"],
...         "ham": ["a", "b", "d"],
...     }
... )
>>> df.join(other_df, on="ham")
shape: (2, 4)
┌─────┬─────┬─────┬───────┐
│ foo ┆ bar ┆ ham ┆ apple │
│ --- ┆ --- ┆ --- ┆ ---   │
│ i64 ┆ f64 ┆ str ┆ str   │
╞═════╪═════╪═════╪═══════╡
│ 1   ┆ 6.0 ┆ a   ┆ x     │
│ 2   ┆ 7.0 ┆ b   ┆ y     │
└─────┴─────┴─────┴───────┘

>>> df.join(other_df, on="ham", how="full")
shape: (4, 5)
┌──────┬──────┬──────┬───────┬───────────┐
│ foo  ┆ bar  ┆ ham  ┆ apple ┆ ham_right │
│ ---  ┆ ---  ┆ ---  ┆ ---   ┆ ---       │
│ i64  ┆ f64  ┆ str  ┆ str   ┆ str       │
╞══════╪══════╪══════╪═══════╪═══════════╡
│ 1    ┆ 6.0  ┆ a    ┆ x     ┆ a         │
│ 2    ┆ 7.0  ┆ b    ┆ y     ┆ b         │
│ null ┆ null ┆ null ┆ z     ┆ d         │
│ 3    ┆ 8.0  ┆ c    ┆ null  ┆ null      │
└──────┴──────┴──────┴───────┴───────────┘

>>> df.join(other_df, on="ham", how="left", coalesce=True)
shape: (3, 4)
┌─────┬─────┬─────┬───────┐
│ foo ┆ bar ┆ ham ┆ apple │
│ --- ┆ --- ┆ --- ┆ ---   │
│ i64 ┆ f64 ┆ str ┆ str   │
╞═════╪═════╪═════╪═══════╡
│ 1   ┆ 6.0 ┆ a   ┆ x     │
│ 2   ┆ 7.0 ┆ b   ┆ y     │
│ 3   ┆ 8.0 ┆ c   ┆ null  │
└─────┴─────┴─────┴───────┘

>>> df.join(other_df, on="ham", how="semi")
shape: (2, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6.0 ┆ a   │
│ 2   ┆ 7.0 ┆ b   │
└─────┴─────┴─────┘

>>> df.join(other_df, on="ham", how="anti")
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞═════╪═════╪═════╡
│ 3   ┆ 8.0 ┆ c   │
└─────┴─────┴─────┘