Set Operations#

Function

Description

UNION

Combine the distinct result sets of two or more SELECT statements. The final result set will have no duplicate rows.

UNION ALL

Combine the complete result sets of two or more SELECT statements. The final result set will be composed of all rows from each query.

UNION [ALL] BY NAME

Combine the result sets of two or more SELECT statements, aligning columns by name instead of by ordinal position; if ALL is omitted the final result will have no duplicate rows. This also combines columns from both datasets.

UNION#

Combine the distinct result sets of two or more SELECT statements. The final result set will have no duplicate rows.

Example:

lf1 = pl.LazyFrame({
    "id": [1, 2, 3],
    "name": ["Alice", "Bob", "Charlie"],
})
lf2 = pl.LazyFrame({
    "id": [2, 3, 4],
    "age": [30, 25, 45],
    "name": ["Bob", "Charlie", "David"],
})
lf_union = pl.sql("""
    SELECT id, name FROM df1
    UNION
    SELECT id, name FROM df2
""").sort(by="id").collect()

# shape: (4, 2)
# ┌─────┬─────────┐
# │ id  ┆ name    │
# │ --- ┆ ---     │
# │ i64 ┆ str     │
# ╞═════╪═════════╡
# │ 1   ┆ Alice   │
# │ 2   ┆ Bob     │
# │ 3   ┆ Charlie │
# │ 4   ┆ David   │
# └─────┴─────────┘

UNION ALL#

Combine the complete result sets of two or more SELECT statements. The final result set will be composed of all rows from each query.

Example:

lf_union_all = pl.sql("""
    SELECT id, name FROM df1
    UNION ALL
    SELECT id, name FROM df2
""").sort(by="id").collect()

# shape: (6, 2)
# ┌─────┬─────────┐
# │ id  ┆ name    │
# │ --- ┆ ---     │
# │ i64 ┆ str     │
# ╞═════╪═════════╡
# │ 1   ┆ Alice   │
# │ 2   ┆ Bob     │
# │ 2   ┆ Bob     │
# │ 3   ┆ Charlie │
# │ 3   ┆ Charlie │
# │ 4   ┆ David   │
# └─────┴─────────┘

UNION BY NAME#

Combine the result sets of two or more SELECT statements, aligning columns by name instead of by ordinal position; if ALL is omitted the final result will have no duplicate rows. This also combines columns from both datasets.

Example:

lf_union_by_name = pl.sql("""
    SELECT * FROM df1
    UNION BY NAME
    SELECT * FROM df2
""").sort(by="id").collect()

# shape: (6, 3)
# ┌─────┬─────────┬──────┐
# │ id  ┆ name    ┆ age  │
# │ --- ┆ ---     ┆ ---  │
# │ i64 ┆ str     ┆ i64  │
# ╞═════╪═════════╪══════╡
# │ 1   ┆ Alice   ┆ null │
# │ 2   ┆ Bob     ┆ null │
# │ 2   ┆ Bob     ┆ 30   │
# │ 3   ┆ Charlie ┆ 25   │
# │ 3   ┆ Charlie ┆ null │
# │ 4   ┆ David   ┆ 45   │
# └─────┴─────────┴──────┘