polars.DataFrame.join#

DataFrame.join(
other: DataFrame,
on: str | Expr | Sequence[str | Expr] | None = None,
how: JoinStrategy = 'inner',
*,
left_on: str | Expr | Sequence[str | Expr] | None = None,
right_on: str | Expr | Sequence[str | Expr] | None = None,
suffix: str = '_right',
validate: JoinValidation = 'm:m',
join_nulls: bool = False,
coalesce: bool | None = None,
) DataFrame[source]#

Join in SQL-like fashion.

Parameters:
other

DataFrame to join with.

on

Name(s) of the join columns in both DataFrames.

how{‘inner’, ‘left’, ‘full’, ‘semi’, ‘anti’, ‘cross’}

Join strategy.

  • inner

    Returns rows that have matching values in both tables

  • left

    Returns all rows from the left table, and the matched rows from the right table

  • full

    Returns all rows when there is a match in either left or right table

  • cross

    Returns the Cartesian product of rows from both tables

  • semi

    Filter rows that have a match in the right table.

  • anti

    Filter rows that do not have a match in the right table.

Note

A left join preserves the row order of the left DataFrame.

left_on

Name(s) of the left join column(s).

right_on

Name(s) of the right join column(s).

suffix

Suffix to append to columns with a duplicate name.

validate: {‘m:m’, ‘m:1’, ‘1:m’, ‘1:1’}

Checks if join is of specified type.

  • many_to_many

    “m:m”: default, does not result in checks

  • one_to_one

    “1:1”: check if join keys are unique in both left and right datasets

  • one_to_many

    “1:m”: check if join keys are unique in left dataset

  • many_to_one

    “m:1”: check if join keys are unique in right dataset

Note

This is currently not supported by the streaming engine.

join_nulls

Join on null values. By default null values will never produce matches.

coalesce

Coalescing behavior (merging of join columns).

  • None: -> join specific.

  • True: -> Always coalesce join columns.

  • False: -> Never coalesce join columns.

Returns:
DataFrame

See also

join_asof

Notes

For joining on columns with categorical data, see polars.StringCache.

Examples

>>> df = pl.DataFrame(
...     {
...         "foo": [1, 2, 3],
...         "bar": [6.0, 7.0, 8.0],
...         "ham": ["a", "b", "c"],
...     }
... )
>>> other_df = pl.DataFrame(
...     {
...         "apple": ["x", "y", "z"],
...         "ham": ["a", "b", "d"],
...     }
... )
>>> df.join(other_df, on="ham")
shape: (2, 4)
┌─────┬─────┬─────┬───────┐
│ foo ┆ bar ┆ ham ┆ apple │
│ --- ┆ --- ┆ --- ┆ ---   │
│ i64 ┆ f64 ┆ str ┆ str   │
╞═════╪═════╪═════╪═══════╡
│ 1   ┆ 6.0 ┆ a   ┆ x     │
│ 2   ┆ 7.0 ┆ b   ┆ y     │
└─────┴─────┴─────┴───────┘
>>> df.join(other_df, on="ham", how="full")
shape: (4, 5)
┌──────┬──────┬──────┬───────┬───────────┐
│ foo  ┆ bar  ┆ ham  ┆ apple ┆ ham_right │
│ ---  ┆ ---  ┆ ---  ┆ ---   ┆ ---       │
│ i64  ┆ f64  ┆ str  ┆ str   ┆ str       │
╞══════╪══════╪══════╪═══════╪═══════════╡
│ 1    ┆ 6.0  ┆ a    ┆ x     ┆ a         │
│ 2    ┆ 7.0  ┆ b    ┆ y     ┆ b         │
│ null ┆ null ┆ null ┆ z     ┆ d         │
│ 3    ┆ 8.0  ┆ c    ┆ null  ┆ null      │
└──────┴──────┴──────┴───────┴───────────┘
>>> df.join(other_df, on="ham", how="left", coalesce=True)
shape: (3, 4)
┌─────┬─────┬─────┬───────┐
│ foo ┆ bar ┆ ham ┆ apple │
│ --- ┆ --- ┆ --- ┆ ---   │
│ i64 ┆ f64 ┆ str ┆ str   │
╞═════╪═════╪═════╪═══════╡
│ 1   ┆ 6.0 ┆ a   ┆ x     │
│ 2   ┆ 7.0 ┆ b   ┆ y     │
│ 3   ┆ 8.0 ┆ c   ┆ null  │
└─────┴─────┴─────┴───────┘
>>> df.join(other_df, on="ham", how="semi")
shape: (2, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6.0 ┆ a   │
│ 2   ┆ 7.0 ┆ b   │
└─────┴─────┴─────┘
>>> df.join(other_df, on="ham", how="anti")
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞═════╪═════╪═════╡
│ 3   ┆ 8.0 ┆ c   │
└─────┴─────┴─────┘