polars.DataFrame.join#
- DataFrame.join(
- other: DataFrame,
- on: str | Expr | Sequence[str | Expr] | None = None,
- how: JoinStrategy = 'inner',
- *,
- left_on: str | Expr | Sequence[str | Expr] | None = None,
- right_on: str | Expr | Sequence[str | Expr] | None = None,
- suffix: str = '_right',
- validate: JoinValidation = 'm:m',
- nulls_equal: bool = False,
- coalesce: bool | None = None,
- maintain_order: MaintainOrderJoin | None = None,
Join in SQL-like fashion.
Changed in version 1.24: The
join_nulls
parameter was renamednulls_equal
.- Parameters:
- other
DataFrame to join with.
- on
Name(s) of the join columns in both DataFrames. If set,
left_on
andright_on
should be None. This should not be specified ifhow='cross'
.- how{‘inner’, ‘left’, ‘right’, ‘full’, ‘semi’, ‘anti’, ‘cross’}
Join strategy.
inner
(Default) Returns rows that have matching values in both tables.
left
Returns all rows from the left table, and the matched rows from the right table.
full
Returns all rows when there is a match in either left or right.
cross
Returns the Cartesian product of rows from both tables
semi
Returns rows from the left table that have a match in the right table.
anti
Returns rows from the left table that have no match in the right table.
- left_on
Name(s) of the left join column(s).
- right_on
Name(s) of the right join column(s).
- suffix
Suffix to append to columns with a duplicate name.
- validate: {‘m:m’, ‘m:1’, ‘1:m’, ‘1:1’}
Checks if join is of specified type.
m:m
(Default) Many-to-many (default). Does not result in checks.
1:1
One-to-one. Checks if join keys are unique in both left and right datasets.
1:m
One-to-many. Checks if join keys are unique in left dataset.
m:1
Many-to-one. Check if join keys are unique in right dataset.
Note
This is currently not supported by the streaming engine.
- nulls_equal
Join on null values. By default null values will never produce matches.
- coalesce
Coalescing behavior (merging of join columns).
None
(Default) Coalesce unless
how='full'
is specified.True
Always coalesce join columns.
False
Never coalesce join columns.
Note
Joining on any other expressions than
col
will turn off coalescing.- maintain_order{‘none’, ‘left’, ‘right’, ‘left_right’, ‘right_left’}
Which DataFrame row order to preserve, if any. Do not rely on any observed ordering without explicitly setting this parameter, as your code may break in a future release. Not specifying any ordering can improve performance. Supported for inner, left, right and full joins
none
(Default) No specific ordering is desired. The ordering might differ across Polars versions or even between different runs.
left
Preserves the order of the left DataFrame.
right
Preserves the order of the right DataFrame.
left_right
First preserves the order of the left DataFrame, then the right.
right_left
First preserves the order of the right DataFrame, then the left.
See also
Notes
For joining on columns with categorical data, see
polars.StringCache
.Examples
>>> df = pl.DataFrame( ... { ... "foo": [1, 2, 3], ... "bar": [6.0, 7.0, 8.0], ... "ham": ["a", "b", "c"], ... } ... ) >>> other_df = pl.DataFrame( ... { ... "apple": ["x", "y", "z"], ... "ham": ["a", "b", "d"], ... } ... ) >>> df.join(other_df, on="ham") shape: (2, 4) ┌─────┬─────┬─────┬───────┐ │ foo ┆ bar ┆ ham ┆ apple │ │ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ f64 ┆ str ┆ str │ ╞═════╪═════╪═════╪═══════╡ │ 1 ┆ 6.0 ┆ a ┆ x │ │ 2 ┆ 7.0 ┆ b ┆ y │ └─────┴─────┴─────┴───────┘
>>> df.join(other_df, on="ham", how="full") shape: (4, 5) ┌──────┬──────┬──────┬───────┬───────────┐ │ foo ┆ bar ┆ ham ┆ apple ┆ ham_right │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ f64 ┆ str ┆ str ┆ str │ ╞══════╪══════╪══════╪═══════╪═══════════╡ │ 1 ┆ 6.0 ┆ a ┆ x ┆ a │ │ 2 ┆ 7.0 ┆ b ┆ y ┆ b │ │ null ┆ null ┆ null ┆ z ┆ d │ │ 3 ┆ 8.0 ┆ c ┆ null ┆ null │ └──────┴──────┴──────┴───────┴───────────┘
>>> df.join(other_df, on="ham", how="full", coalesce=True) shape: (4, 4) ┌──────┬──────┬─────┬───────┐ │ foo ┆ bar ┆ ham ┆ apple │ │ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ f64 ┆ str ┆ str │ ╞══════╪══════╪═════╪═══════╡ │ 1 ┆ 6.0 ┆ a ┆ x │ │ 2 ┆ 7.0 ┆ b ┆ y │ │ null ┆ null ┆ d ┆ z │ │ 3 ┆ 8.0 ┆ c ┆ null │ └──────┴──────┴─────┴───────┘
>>> df.join(other_df, on="ham", how="left") shape: (3, 4) ┌─────┬─────┬─────┬───────┐ │ foo ┆ bar ┆ ham ┆ apple │ │ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ f64 ┆ str ┆ str │ ╞═════╪═════╪═════╪═══════╡ │ 1 ┆ 6.0 ┆ a ┆ x │ │ 2 ┆ 7.0 ┆ b ┆ y │ │ 3 ┆ 8.0 ┆ c ┆ null │ └─────┴─────┴─────┴───────┘
>>> df.join(other_df, on="ham", how="semi") shape: (2, 3) ┌─────┬─────┬─────┐ │ foo ┆ bar ┆ ham │ │ --- ┆ --- ┆ --- │ │ i64 ┆ f64 ┆ str │ ╞═════╪═════╪═════╡ │ 1 ┆ 6.0 ┆ a │ │ 2 ┆ 7.0 ┆ b │ └─────┴─────┴─────┘
>>> df.join(other_df, on="ham", how="anti") shape: (1, 3) ┌─────┬─────┬─────┐ │ foo ┆ bar ┆ ham │ │ --- ┆ --- ┆ --- │ │ i64 ┆ f64 ┆ str │ ╞═════╪═════╪═════╡ │ 3 ┆ 8.0 ┆ c │ └─────┴─────┴─────┘
>>> df.join(other_df, how="cross") shape: (9, 5) ┌─────┬─────┬─────┬───────┬───────────┐ │ foo ┆ bar ┆ ham ┆ apple ┆ ham_right │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ f64 ┆ str ┆ str ┆ str │ ╞═════╪═════╪═════╪═══════╪═══════════╡ │ 1 ┆ 6.0 ┆ a ┆ x ┆ a │ │ 1 ┆ 6.0 ┆ a ┆ y ┆ b │ │ 1 ┆ 6.0 ┆ a ┆ z ┆ d │ │ 2 ┆ 7.0 ┆ b ┆ x ┆ a │ │ 2 ┆ 7.0 ┆ b ┆ y ┆ b │ │ 2 ┆ 7.0 ┆ b ┆ z ┆ d │ │ 3 ┆ 8.0 ┆ c ┆ x ┆ a │ │ 3 ┆ 8.0 ┆ c ┆ y ┆ b │ │ 3 ┆ 8.0 ┆ c ┆ z ┆ d │ └─────┴─────┴─────┴───────┴───────────┘