Set Operations#
Function |
Description |
|---|---|
Combine the result sets of two SELECT statements, returning only the rows that appear in the first result set but not in the second. |
|
Combine the result sets of two SELECT statements, returning only the rows that appear in the first result set but not in the second, aligning columns by name instead of by ordinal position. |
|
Combine the result sets of two SELECT statements, returning only the rows that appear in both result sets. |
|
Combine the result sets of two SELECT statements, returning only the rows that appear in both result sets, aligning columns by name instead of by ordinal position. |
|
Combine the distinct result sets of two or more SELECT statements. The final result set will have no duplicate rows. |
|
Combine the complete result sets of two or more SELECT statements. The final result set will be composed of all rows from each query. |
|
Combine the result sets of two or more SELECT statements, aligning columns
by name instead of by ordinal position; if |
EXCEPT#
Combine the result sets of two SELECT statements, returning only the rows that appear in the first result set but not in the second.
Example:
lf1 = pl.LazyFrame({
"id": [1, 2, 3],
"name": ["Alice", "Bob", "Charlie"],
})
lf2 = pl.LazyFrame({
"id": [2, 3, 4],
"age": [30, 25, 45],
"name": ["Bob", "Charlie", "David"],
})
pl.sql("""
SELECT id, name FROM lf1
EXCEPT
SELECT id, name FROM lf2
""").sort(by="id").collect()
# shape: (1, 2)
# ┌─────┬───────┐
# │ id ┆ name │
# │ --- ┆ --- │
# │ i64 ┆ str │
# ╞═════╪═══════╡
# │ 1 ┆ Alice │
# └─────┴───────┘
EXCEPT BY NAME#
Combine the result sets of two SELECT statements, returning only the rows that appear in the first result set but not in the second, aligning columns by name instead of by ordinal position. This allows the two queries to have different column orders or different numbers of columns (only commonly-named columns are used for the comparison).
Example:
lf1 = pl.LazyFrame({
"id": [1, 2, 3],
"name": ["Alice", "Bob", "Charlie"],
})
lf2 = pl.LazyFrame({
"name": ["Bob", "Charlie", "David"],
"id": [2, 3, 4],
})
pl.sql("""
SELECT * FROM lf1
EXCEPT BY NAME
SELECT * FROM lf2
""").sort(by="id").collect()
# shape: (1, 2)
# ┌─────┬───────┐
# │ id ┆ name │
# │ --- ┆ --- │
# │ i64 ┆ str │
# ╞═════╪═══════╡
# │ 1 ┆ Alice │
# └─────┴───────┘
INTERSECT#
Combine the result sets of two SELECT statements, returning only the rows that appear in both result sets.
Example:
pl.sql("""
SELECT id, name FROM lf1
INTERSECT
SELECT id, name FROM lf2
""").sort(by="id").collect()
# shape: (2, 2)
# ┌─────┬─────────┐
# │ id ┆ name │
# │ --- ┆ --- │
# │ i64 ┆ str │
# ╞═════╪═════════╡
# │ 2 ┆ Bob │
# │ 3 ┆ Charlie │
# └─────┴─────────┘
INTERSECT BY NAME#
Combine the result sets of two SELECT statements, returning only the rows that appear in both result sets, aligning columns by name instead of by ordinal position. This allows the two queries to have different column orders or different numbers of columns (only commonly-named columns are used for the comparison).
Example:
lf1 = pl.LazyFrame({
"id": [1, 2, 3],
"name": ["Alice", "Bob", "Charlie"],
})
lf2 = pl.LazyFrame({
"name": ["Bob", "Charlie", "David"],
"id": [2, 3, 4],
})
pl.sql("""
SELECT * FROM lf1
INTERSECT BY NAME
SELECT * FROM lf2
""").sort(by="id").collect()
# shape: (2, 2)
# ┌─────┬─────────┐
# │ id ┆ name │
# │ --- ┆ --- │
# │ i64 ┆ str │
# ╞═════╪═════════╡
# │ 2 ┆ Bob │
# │ 3 ┆ Charlie │
# └─────┴─────────┘
UNION#
Combine the distinct result sets of two or more SELECT statements. The final result set will have no duplicate rows.
Example:
pl.sql("""
SELECT id, name FROM lf1
UNION
SELECT id, name FROM lf2
""").sort(by="id").collect()
# shape: (4, 2)
# ┌─────┬─────────┐
# │ id ┆ name │
# │ --- ┆ --- │
# │ i64 ┆ str │
# ╞═════╪═════════╡
# │ 1 ┆ Alice │
# │ 2 ┆ Bob │
# │ 3 ┆ Charlie │
# │ 4 ┆ David │
# └─────┴─────────┘
UNION ALL#
Combine the complete result sets of two or more SELECT statements. The final result set will be composed of all rows from each query.
Example:
pl.sql("""
SELECT id, name FROM lf1
UNION ALL
SELECT id, name FROM lf2
""").sort(by="id").collect()
# shape: (6, 2)
# ┌─────┬─────────┐
# │ id ┆ name │
# │ --- ┆ --- │
# │ i64 ┆ str │
# ╞═════╪═════════╡
# │ 1 ┆ Alice │
# │ 2 ┆ Bob │
# │ 2 ┆ Bob │
# │ 3 ┆ Charlie │
# │ 3 ┆ Charlie │
# │ 4 ┆ David │
# └─────┴─────────┘
UNION BY NAME#
Combine the result sets of two or more SELECT statements, aligning columns
by name instead of by ordinal position; if ALL is omitted the final result
will have no duplicate rows. This also combines columns from both datasets.
Example:
pl.sql("""
SELECT * FROM lf1
UNION BY NAME
SELECT * FROM lf2
""").sort(by="id").collect()
# shape: (6, 3)
# ┌─────┬─────────┬──────┐
# │ id ┆ name ┆ age │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ str ┆ i64 │
# ╞═════╪═════════╪══════╡
# │ 1 ┆ Alice ┆ null │
# │ 2 ┆ Bob ┆ null │
# │ 2 ┆ Bob ┆ 30 │
# │ 3 ┆ Charlie ┆ 25 │
# │ 3 ┆ Charlie ┆ null │
# │ 4 ┆ David ┆ 45 │
# └─────┴─────────┴──────┘