Set Operations#

Function

Description

EXCEPT

Combine the result sets of two SELECT statements, returning only the rows that appear in the first result set but not in the second.

INTERSECT

Combine the result sets of two SELECT statements, returning only the rows that appear in both result sets.

UNION

Combine the distinct result sets of two or more SELECT statements. The final result set will have no duplicate rows.

UNION ALL

Combine the complete result sets of two or more SELECT statements. The final result set will be composed of all rows from each query.

UNION [ALL] BY NAME

Combine the result sets of two or more SELECT statements, aligning columns by name instead of by ordinal position; if ALL is omitted the final result will have no duplicate rows. This also combines columns from both datasets.

EXCEPT#

Combine the result sets of two SELECT statements, returning only the rows that appear in the first result set but not in the second.

Example:

lf1 = pl.LazyFrame({
    "id": [1, 2, 3],
    "name": ["Alice", "Bob", "Charlie"],
})
lf2 = pl.LazyFrame({
    "id": [2, 3, 4],
    "age": [30, 25, 45],
    "name": ["Bob", "Charlie", "David"],
})
pl.sql("""
    SELECT id, name FROM lf1
    EXCEPT
    SELECT id, name FROM lf2
""").sort(by="id").collect()
# shape: (1, 2)
# ┌─────┬───────┐
# │ id  ┆ name  │
# │ --- ┆ ---   │
# │ i64 ┆ str   │
# ╞═════╪═══════╡
# │ 1   ┆ Alice │
# └─────┴───────┘

INTERSECT#

Combine the result sets of two SELECT statements, returning only the rows that appear in both result sets.

Example:

pl.sql("""
    SELECT id, name FROM lf1
    INTERSECT
    SELECT id, name FROM lf2
""").sort(by="id").collect()
# shape: (2, 2)
# ┌─────┬─────────┐
# │ id  ┆ name    │
# │ --- ┆ ---     │
# │ i64 ┆ str     │
# ╞═════╪═════════╡
# │ 2   ┆ Bob     │
# │ 3   ┆ Charlie │
# └─────┴─────────┘

UNION#

Combine the distinct result sets of two or more SELECT statements. The final result set will have no duplicate rows.

Example:

pl.sql("""
    SELECT id, name FROM lf1
    UNION
    SELECT id, name FROM lf2
""").sort(by="id").collect()
# shape: (4, 2)
# ┌─────┬─────────┐
# │ id  ┆ name    │
# │ --- ┆ ---     │
# │ i64 ┆ str     │
# ╞═════╪═════════╡
# │ 1   ┆ Alice   │
# │ 2   ┆ Bob     │
# │ 3   ┆ Charlie │
# │ 4   ┆ David   │
# └─────┴─────────┘

UNION ALL#

Combine the complete result sets of two or more SELECT statements. The final result set will be composed of all rows from each query.

Example:

pl.sql("""
    SELECT id, name FROM lf1
    UNION ALL
    SELECT id, name FROM lf2
""").sort(by="id").collect()
# shape: (6, 2)
# ┌─────┬─────────┐
# │ id  ┆ name    │
# │ --- ┆ ---     │
# │ i64 ┆ str     │
# ╞═════╪═════════╡
# │ 1   ┆ Alice   │
# │ 2   ┆ Bob     │
# │ 2   ┆ Bob     │
# │ 3   ┆ Charlie │
# │ 3   ┆ Charlie │
# │ 4   ┆ David   │
# └─────┴─────────┘

UNION BY NAME#

Combine the result sets of two or more SELECT statements, aligning columns by name instead of by ordinal position; if ALL is omitted the final result will have no duplicate rows. This also combines columns from both datasets.

Example:

pl.sql("""
    SELECT * FROM lf1
    UNION BY NAME
    SELECT * FROM lf2
""").sort(by="id").collect()
# shape: (6, 3)
# ┌─────┬─────────┬──────┐
# │ id  ┆ name    ┆ age  │
# │ --- ┆ ---     ┆ ---  │
# │ i64 ┆ str     ┆ i64  │
# ╞═════╪═════════╪══════╡
# │ 1   ┆ Alice   ┆ null │
# │ 2   ┆ Bob     ┆ null │
# │ 2   ┆ Bob     ┆ 30   │
# │ 3   ┆ Charlie ┆ 25   │
# │ 3   ┆ Charlie ┆ null │
# │ 4   ┆ David   ┆ 45   │
# └─────┴─────────┴──────┘