polars.DataFrame.join#
- DataFrame.join(
- other: DataFrame,
- on: str | Expr | Sequence[str | Expr] | None = None,
- how: JoinStrategy = 'inner',
- *,
- left_on: str | Expr | Sequence[str | Expr] | None = None,
- right_on: str | Expr | Sequence[str | Expr] | None = None,
- suffix: str = '_right',
- validate: JoinValidation = 'm:m',
- join_nulls: bool = False,
- coalesce: bool | None = None,
Join in SQL-like fashion.
- Parameters:
- other
DataFrame to join with.
- on
Name(s) of the join columns in both DataFrames.
- how{‘inner’, ‘left’, ‘full’, ‘semi’, ‘anti’, ‘cross’}
Join strategy.
- inner
Returns rows that have matching values in both tables
- left
Returns all rows from the left table, and the matched rows from the right table
- full
Returns all rows when there is a match in either left or right table
- cross
Returns the Cartesian product of rows from both tables
- semi
Filter rows that have a match in the right table.
- anti
Filter rows that do not have a match in the right table.
Note
A left join preserves the row order of the left DataFrame.
- left_on
Name(s) of the left join column(s).
- right_on
Name(s) of the right join column(s).
- suffix
Suffix to append to columns with a duplicate name.
- validate: {‘m:m’, ‘m:1’, ‘1:m’, ‘1:1’}
Checks if join is of specified type.
- many_to_many
“m:m”: default, does not result in checks
- one_to_one
“1:1”: check if join keys are unique in both left and right datasets
- one_to_many
“1:m”: check if join keys are unique in left dataset
- many_to_one
“m:1”: check if join keys are unique in right dataset
Note
This is currently not supported by the streaming engine.
- join_nulls
Join on null values. By default null values will never produce matches.
- coalesce
Coalescing behavior (merging of join columns).
None: -> join specific.
True: -> Always coalesce join columns.
False: -> Never coalesce join columns.
- Returns:
- DataFrame
See also
Notes
For joining on columns with categorical data, see
polars.StringCache
.Examples
>>> df = pl.DataFrame( ... { ... "foo": [1, 2, 3], ... "bar": [6.0, 7.0, 8.0], ... "ham": ["a", "b", "c"], ... } ... ) >>> other_df = pl.DataFrame( ... { ... "apple": ["x", "y", "z"], ... "ham": ["a", "b", "d"], ... } ... ) >>> df.join(other_df, on="ham") shape: (2, 4) ┌─────┬─────┬─────┬───────┐ │ foo ┆ bar ┆ ham ┆ apple │ │ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ f64 ┆ str ┆ str │ ╞═════╪═════╪═════╪═══════╡ │ 1 ┆ 6.0 ┆ a ┆ x │ │ 2 ┆ 7.0 ┆ b ┆ y │ └─────┴─────┴─────┴───────┘
>>> df.join(other_df, on="ham", how="full") shape: (4, 5) ┌──────┬──────┬──────┬───────┬───────────┐ │ foo ┆ bar ┆ ham ┆ apple ┆ ham_right │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ f64 ┆ str ┆ str ┆ str │ ╞══════╪══════╪══════╪═══════╪═══════════╡ │ 1 ┆ 6.0 ┆ a ┆ x ┆ a │ │ 2 ┆ 7.0 ┆ b ┆ y ┆ b │ │ null ┆ null ┆ null ┆ z ┆ d │ │ 3 ┆ 8.0 ┆ c ┆ null ┆ null │ └──────┴──────┴──────┴───────┴───────────┘
>>> df.join(other_df, on="ham", how="left", coalesce=True) shape: (3, 4) ┌─────┬─────┬─────┬───────┐ │ foo ┆ bar ┆ ham ┆ apple │ │ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ f64 ┆ str ┆ str │ ╞═════╪═════╪═════╪═══════╡ │ 1 ┆ 6.0 ┆ a ┆ x │ │ 2 ┆ 7.0 ┆ b ┆ y │ │ 3 ┆ 8.0 ┆ c ┆ null │ └─────┴─────┴─────┴───────┘
>>> df.join(other_df, on="ham", how="semi") shape: (2, 3) ┌─────┬─────┬─────┐ │ foo ┆ bar ┆ ham │ │ --- ┆ --- ┆ --- │ │ i64 ┆ f64 ┆ str │ ╞═════╪═════╪═════╡ │ 1 ┆ 6.0 ┆ a │ │ 2 ┆ 7.0 ┆ b │ └─────┴─────┴─────┘
>>> df.join(other_df, on="ham", how="anti") shape: (1, 3) ┌─────┬─────┬─────┐ │ foo ┆ bar ┆ ham │ │ --- ┆ --- ┆ --- │ │ i64 ┆ f64 ┆ str │ ╞═════╪═════╪═════╡ │ 3 ┆ 8.0 ┆ c │ └─────┴─────┴─────┘