Aggregate#

Function	Description
AVG	Returns the average (mean) of all the elements in the grouping.
COUNT	Returns the amount of elements in the grouping.
FIRST	Returns the first element of the grouping.
LAST	Returns the last element of the grouping.
MAX	Returns the greatest (maximum) of all the elements in the grouping.
MEDIAN	Returns the median element from the grouping.
MIN	Returns the smallest (minimum) of all the elements in the grouping.
QUANTILE_CONT	Returns the continuous quantile element from the grouping (interpolated value between two closest values).
QUANTILE_DISC	Divides the [0, 1] interval into equal-length subintervals, each corresponding to a value, and returns the value associated with the subinterval where the quantile value falls.
STDDEV	Returns the standard deviation of all the elements in the grouping.
SUM	Returns the sum of all the elements in the grouping.
VARIANCE	Returns the variance of all the elements in the grouping.

AVG#

Returns the average (mean) of all the elements in the grouping.

Example:

df = pl.DataFrame({"bar": [20, 10, 30, 40]})
df.sql("""
  SELECT AVG(bar) AS bar_avg FROM self
""")
# shape: (1, 1)
# ┌─────────┐
# │ bar_avg │
# │ ---     │
# │ f64     │
# ╞═════════╡
# │ 25.0    │
# └─────────┘

COUNT#

Returns the amount of elements in the grouping.

Example:

df = pl.DataFrame(
  {
    "foo": ["b", "a", "b", "c"],
    "bar": [20, 10, 30, 40]
  }
)
df.sql("""
  SELECT
    COUNT(bar) AS n_bar,
    COUNT(DISTINCT foo) AS n_foo_unique
  FROM self
""")

# shape: (1, 2)
# ┌───────┬──────────────┐
# │ n_bar ┆ n_foo_unique │
# │ ---   ┆ ---          │
# │ u32   ┆ u32          │
# ╞═══════╪══════════════╡
# │ 4     ┆ 3            │
# └───────┴──────────────┘

FIRST#

Returns the first element of the grouping.

Example:

df = pl.DataFrame({"foo": ["b", "a", "b", "c"]})
df.sql("""
  SELECT FIRST(foo) AS ff FROM self
""")
# shape: (1, 1)
# ┌─────┐
# │ ff  │
# │ --- │
# │ str │
# ╞═════╡
# │ b   │
# └─────┘

LAST#

Returns the last element of the grouping.

Example:

df = pl.DataFrame({"foo": ["b", "a", "b", "c"]})
df.sql("""
  SELECT LAST(foo) AS lf FROM self
""")
# shape: (1, 1)
# ┌─────┐
# │ lf  │
# │ --- │
# │ str │
# ╞═════╡
# │ c   │
# └─────┘

MAX#

Returns the greatest (maximum) of all the elements in the grouping.

Example:

df = pl.DataFrame({"bar": [20, 10, 30, 40]})
df.sql("""
  SELECT MAX(bar) AS bar_max FROM self
""")
# shape: (1, 1)
# ┌─────────┐
# │ bar_max │
# │ ---     │
# │ i64     │
# ╞═════════╡
# │ 40      │
# └─────────┘

MEDIAN#

Returns the median element from the grouping.

Example:

df = pl.DataFrame({"bar": [20, 10, 30, 40]})
df.sql("""
  SELECT MEDIAN(bar) AS bar_median FROM self
""")
# shape: (1, 1)
# ┌────────────┐
# │ bar_median │
# │ ---        │
# │ f64        │
# ╞════════════╡
# │ 25.0       │
# └────────────┘

MIN#

Returns the smallest (minimum) of all the elements in the grouping.

Example:

df = pl.DataFrame({"bar": [20, 10, 30, 40]})
df.sql("""
  SELECT MIN(bar) AS bar_min FROM self
""")
# shape: (1, 1)
# ┌─────────┐
# │ bar_min │
# │ ---     │
# │ i64     │
# ╞═════════╡
# │ 10      │
# └─────────┘

QUANTILE_CONT#

Returns the continuous quantile element from the grouping (interpolated value between two closest values).

Example:

df = pl.DataFrame({"foo": [5, 20, 10, 30, 70, 40, 10, 90]})
df.sql("""
  SELECT
    QUANTILE_CONT(foo, 0.25) AS foo_q25,
    QUANTILE_CONT(foo, 0.50) AS foo_q50,
    QUANTILE_CONT(foo, 0.75) AS foo_q75,
  FROM self
""")
# shape: (1, 3)
# ┌─────────┬─────────┬─────────┐
# │ foo_q25 ┆ foo_q50 ┆ foo_q75 │
# │ ---     ┆ ---     ┆ ---     │
# │ f64     ┆ f64     ┆ f64     │
# ╞═════════╪═════════╪═════════╡
# │ 10.0    ┆ 25.0    ┆ 47.5    │
# └─────────┴─────────┴─────────┘

QUANTILE_DISC#

Divides the [0, 1] interval into equal-length subintervals, each corresponding to a value, and returns the value associated with the subinterval where the quantile value falls.

Example:

df = pl.DataFrame({"foo": [5, 20, 10, 30, 70, 40, 10, 90]})
df.sql("""
  SELECT
    QUANTILE_DISC(foo, 0.25) AS foo_q25,
    QUANTILE_DISC(foo, 0.50) AS foo_q50,
    QUANTILE_DISC(foo, 0.75) AS foo_q75,
  FROM self
""")
# shape: (1, 3)
# ┌─────────┬─────────┬─────────┐
# │ foo_q25 ┆ foo_q50 ┆ foo_q75 │
# │ ---     ┆ ---     ┆ ---     │
# │ f64     ┆ f64     ┆ f64     │
# ╞═════════╪═════════╪═════════╡
# │ 10.0    ┆ 20.0    ┆ 40.0    │
# └─────────┴─────────┴─────────┘

STDDEV#

Returns the sample standard deviation of all the elements in the grouping.

Aliases

STDEV, STDEV_SAMP, STDDEV_SAMP

Example:

df = pl.DataFrame(
    {
        "foo": [10, 20, 8],
        "bar": [10, 7, 18],
    }
)
df.sql("""
  SELECT STDDEV(foo) AS foo_std, STDDEV(bar) AS bar_std FROM self
""")
# shape: (1, 2)
# ┌──────────┬──────────┐
# │ foo_std  ┆ bar_std  │
# │ ---      ┆ ---      │
# │ f64      ┆ f64      │
# ╞══════════╪══════════╡
# │ 6.429101 ┆ 5.686241 │
# └──────────┴──────────┘

SUM#

Returns the sum of all the elements in the grouping.

Example:

df = pl.DataFrame(
    {
        "foo": [1, 2, 3],
        "bar": [6, 7, 8],
        "ham": ["a", "b", "c"],
    }
)
df.sql("""
  SELECT SUM(foo) AS foo_sum, SUM(bar) AS bar_sum FROM self
""")
# shape: (1, 2)
# ┌─────────┬─────────┐
# │ foo_sum ┆ bar_sum │
# │ ---     ┆ ---     │
# │ i64     ┆ i64     │
# ╞═════════╪═════════╡
# │ 6       ┆ 21      │
# └─────────┴─────────┘

VARIANCE#

Returns the variance of all the elements in the grouping.

Aliases

VAR, VAR_SAMP

Example:

df = pl.DataFrame(
    {
        "foo": [10, 20, 8],
        "bar": [10, 7, 18],
    }
)
df.sql("""
  SELECT VARIANCE(foo) AS foo_var, VARIANCE(bar) AS bar_var FROM self
""")
# shape: (1, 2)
# ┌───────────┬───────────┐
# │ foo_var   ┆ bar_var   │
# │ ---       ┆ ---       │
# │ f64       ┆ f64       │
# ╞═══════════╪═══════════╡
# │ 41.333333 ┆ 32.333333 │
# └───────────┴───────────┘