Window#
Function |
Description |
|---|---|
Returns the rank of each row within a window partition, without gaps for ties. |
|
Returns the first value in an ordered set of values with respect to the window declared in |
|
Returns the last value in an ordered set of values with respect to the window declared in |
|
Define a window (a set of rows) within which a function is applied. |
|
Returns the rank of each row within a window partition, with gaps for ties. |
|
Returns the sequential row number within a window partition, starting from 1. |
Note
As a DataFrame engine Polars defaults to ROWS framing semantics for window functions when an explicit
window specification is omitted; specifically, ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. This
differs from the default RANGE framing semantics typically used by database engines.
DENSE_RANK#
Returns the rank of each row within a window partition, without gaps for ties. Rows with equal values receive the same rank, and the next rank number is consecutive (no gaps).
Requirements:
Must be used with an
OVERclause.That clause must have
ORDER BYin the window specification.
Example:
df = pl.DataFrame({
"id": [1, 2, 3, 4, 5, 6],
"category": ["A", "A", "A", "B", "B", "B"],
"score": [85, 90, 90, 75, 80, 80]
})
df.sql("""
SELECT
id,
category,
score,
RANK() OVER (PARTITION BY category ORDER BY score DESC) AS rank,
DENSE_RANK() OVER (PARTITION BY category ORDER BY score DESC) AS dense_rank
FROM self
ORDER BY category, score DESC
""")
# shape: (6, 5)
# ┌─────┬──────────┬───────┬──────┬────────────┐
# │ id ┆ category ┆ score ┆ rank ┆ dense_rank │
# │ --- ┆ --- ┆ --- ┆ --- ┆ --- │
# │ i64 ┆ str ┆ i64 ┆ u32 ┆ u32 │
# ╞═════╪══════════╪═══════╪══════╪════════════╡
# │ 2 ┆ A ┆ 90 ┆ 1 ┆ 1 │
# │ 3 ┆ A ┆ 90 ┆ 1 ┆ 1 │
# │ 1 ┆ A ┆ 85 ┆ 3 ┆ 2 │
# │ 5 ┆ B ┆ 80 ┆ 1 ┆ 1 │
# │ 6 ┆ B ┆ 80 ┆ 1 ┆ 1 │
# │ 4 ┆ B ┆ 75 ┆ 3 ┆ 2 │
# └─────┴──────────┴───────┴──────┴────────────┘
FIRST_VALUE#
Returns the first value in an ordered set of values with respect to the window declared in OVER.
LAST_VALUE#
Returns the last value in an ordered set of values with respect to the window declared in OVER.
RANK#
Returns the rank of each row within a window partition, with gaps for ties. Rows with equal values receive the same rank, and the next rank skips numbers (creating gaps).
Requirements:
Must be used with an
OVERclause.That clause must have
ORDER BYin the window specification.
Example:
df = pl.DataFrame({
"id": [1, 2, 3, 4, 5, 6],
"category": ["A", "A", "A", "B", "B", "B"],
"score": [85, 90, 90, 75, 80, 80]
})
df.sql("""
SELECT
id,
category,
score,
DENSE_RANK() OVER (PARTITION BY category ORDER BY score DESC) AS dense_rank,
RANK() OVER (PARTITION BY category ORDER BY score DESC) AS rank
FROM self
ORDER BY category, score DESC
""")
# shape: (6, 5)
# ┌─────┬──────────┬───────┬────────────┬──────┐
# │ id ┆ category ┆ score ┆ dense_rank ┆ rank │
# │ --- ┆ --- ┆ --- ┆ --- ┆ --- │
# │ i64 ┆ str ┆ i64 ┆ u32 ┆ u32 │
# ╞═════╪══════════╪═══════╪════════════╪══════╡
# │ 2 ┆ A ┆ 90 ┆ 1 ┆ 1 │
# │ 3 ┆ A ┆ 90 ┆ 1 ┆ 1 │
# │ 1 ┆ A ┆ 85 ┆ 2 ┆ 3 │2)
# │ 5 ┆ B ┆ 80 ┆ 1 ┆ 1 │
# │ 6 ┆ B ┆ 80 ┆ 1 ┆ 1 │
# │ 4 ┆ B ┆ 75 ┆ 2 ┆ 3 │2)
# └─────┴──────────┴───────┴────────────┴──────┘
ROW_NUMBER#
Returns the sequential row number, optionally within a window partition, starting from 1. Unlike
RANK and DENSE_RANK, ROW_NUMBER always returns unique numbers even when values are tied.
Example:
df = pl.DataFrame({
"id": [1, 2, 3, 4, 5, 6],
"category": ["A", "A", "A", "B", "B", "B"],
"value": [100, 200, 200, 150, 300, 150]
})
df.sql("""
SELECT
ROW_NUMBER() AS x,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY id) AS y,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY id DESC) AS z,
category,
value
FROM self
ORDER BY category, id
""")
# shape: (6, 5)
# ┌─────┬─────┬─────┬──────────┬───────┐
# │ x ┆ y ┆ z ┆ category ┆ value │
# │ --- ┆ --- ┆ --- ┆ --- ┆ --- │
# │ u32 ┆ u32 ┆ u32 ┆ str ┆ i64 │
# ╞═════╪═════╪═════╪══════════╪═══════╡
# │ 1 ┆ 1 ┆ 3 ┆ A ┆ 100 │
# │ 2 ┆ 2 ┆ 2 ┆ A ┆ 200 │
# │ 3 ┆ 3 ┆ 1 ┆ A ┆ 200 │
# │ 4 ┆ 1 ┆ 3 ┆ B ┆ 150 │
# │ 5 ┆ 2 ┆ 2 ┆ B ┆ 300 │
# │ 6 ┆ 3 ┆ 1 ┆ B ┆ 150 │
# └─────┴─────┴─────┴──────────┴───────┘
OVER#
Used to define a window (a set of rows) within which a function is applied.
Notes:
As a DataFrame engine Polars defaults to ROWS framing semantics for window
functions when an explicit window specification is omitted; specifically,
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. This differs from the
default RANGE framing semantics typically used by database engines.
Example:
df = pl.DataFrame(
{
"idx": [0, 1, 2, 3, 4, 5, 6],
"label": ["aaa", "aaa", "bbb", "bbb", "aaa", "ccc", "aaa"],
"value": [10, 20, 30, 40, 50, -5, 0],
}
)
df.sql("""
SELECT
*,
FIRST_VALUE(value) OVER (
PARTITION BY label ORDER BY idx
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS first_val,
LAST_VALUE(value) OVER (
PARTITION BY label ORDER BY idx
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS last_val,
SUM(value) OVER (
PARTITION BY label ORDER BY idx
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total_by_label
FROM self
ORDER BY label, idx
""")
# shape: (7, 6)
# ┌─────┬───────┬───────┬───────────┬──────────┬────────────────────────┐
# │ idx ┆ label ┆ value ┆ first_val ┆ last_val ┆ running_total_by_label │
# │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
# │ i64 ┆ str ┆ i64 ┆ i64 ┆ i64 ┆ i64 │
# ╞═════╪═══════╪═══════╪═══════════╪══════════╪════════════════════════╡
# │ 0 ┆ aaa ┆ 10 ┆ 10 ┆ 10 ┆ 10 │
# │ 1 ┆ aaa ┆ 20 ┆ 10 ┆ 20 ┆ 30 │
# │ 4 ┆ aaa ┆ 50 ┆ 10 ┆ 50 ┆ 80 │
# │ 6 ┆ aaa ┆ 0 ┆ 10 ┆ 0 ┆ 80 │
# │ 2 ┆ bbb ┆ 30 ┆ 30 ┆ 30 ┆ 30 │
# │ 3 ┆ bbb ┆ 40 ┆ 30 ┆ 40 ┆ 70 │
# │ 5 ┆ ccc ┆ -5 ┆ -5 ┆ -5 ┆ -5 │
# └─────┴───────┴───────┴───────────┴──────────┴────────────────────────┘