Aggregate#
Function |
Description |
---|---|
Returns the average (mean) of all the elements in the grouping. |
|
Returns the amount of elements in the grouping. |
|
Returns the first element of the grouping. |
|
Returns the last element of the grouping. |
|
Returns the greatest (maximum) of all the elements in the grouping. |
|
Returns the median element from the grouping. |
|
Returns the smallest (minimum) of all the elements in the grouping. |
|
Returns the continuous quantile element from the grouping (interpolated value between two closest values). |
|
Divides the [0, 1] interval into equal-length subintervals, each corresponding to a value, and returns the value associated with the subinterval where the quantile value falls. |
|
Returns the standard deviation of all the elements in the grouping. |
|
Returns the sum of all the elements in the grouping. |
|
Returns the variance of all the elements in the grouping. |
AVG#
Returns the average (mean) of all the elements in the grouping.
Example:
df = pl.DataFrame({"bar": [20, 10, 30, 40]})
df.sql("""
SELECT AVG(bar) AS bar_avg FROM self
""")
# shape: (1, 1)
# ┌─────────┐
# │ bar_avg │
# │ --- │
# │ f64 │
# ╞═════════╡
# │ 25.0 │
# └─────────┘
COUNT#
Returns the amount of elements in the grouping.
Example:
df = pl.DataFrame(
{
"foo": ["b", "a", "b", "c"],
"bar": [20, 10, 30, 40]
}
)
df.sql("""
SELECT
COUNT(bar) AS n_bar,
COUNT(DISTINCT foo) AS n_foo_unique
FROM self
""")
# shape: (1, 2)
# ┌───────┬──────────────┐
# │ n_bar ┆ n_foo_unique │
# │ --- ┆ --- │
# │ u32 ┆ u32 │
# ╞═══════╪══════════════╡
# │ 4 ┆ 3 │
# └───────┴──────────────┘
FIRST#
Returns the first element of the grouping.
Example:
df = pl.DataFrame({"foo": ["b", "a", "b", "c"]})
df.sql("""
SELECT FIRST(foo) AS ff FROM self
""")
# shape: (1, 1)
# ┌─────┐
# │ ff │
# │ --- │
# │ str │
# ╞═════╡
# │ b │
# └─────┘
LAST#
Returns the last element of the grouping.
Example:
df = pl.DataFrame({"foo": ["b", "a", "b", "c"]})
df.sql("""
SELECT LAST(foo) AS lf FROM self
""")
# shape: (1, 1)
# ┌─────┐
# │ lf │
# │ --- │
# │ str │
# ╞═════╡
# │ c │
# └─────┘
MAX#
Returns the greatest (maximum) of all the elements in the grouping.
Example:
df = pl.DataFrame({"bar": [20, 10, 30, 40]})
df.sql("""
SELECT MAX(bar) AS bar_max FROM self
""")
# shape: (1, 1)
# ┌─────────┐
# │ bar_max │
# │ --- │
# │ i64 │
# ╞═════════╡
# │ 40 │
# └─────────┘
MEDIAN#
Returns the median element from the grouping.
Example:
df = pl.DataFrame({"bar": [20, 10, 30, 40]})
df.sql("""
SELECT MEDIAN(bar) AS bar_median FROM self
""")
# shape: (1, 1)
# ┌────────────┐
# │ bar_median │
# │ --- │
# │ f64 │
# ╞════════════╡
# │ 25.0 │
# └────────────┘
MIN#
Returns the smallest (minimum) of all the elements in the grouping.
Example:
df = pl.DataFrame({"bar": [20, 10, 30, 40]})
df.sql("""
SELECT MIN(bar) AS bar_min FROM self
""")
# shape: (1, 1)
# ┌─────────┐
# │ bar_min │
# │ --- │
# │ i64 │
# ╞═════════╡
# │ 10 │
# └─────────┘
QUANTILE_CONT#
Returns the continuous quantile element from the grouping (interpolated value between two closest values).
Example:
df = pl.DataFrame({"foo": [5, 20, 10, 30, 70, 40, 10, 90]})
df.sql("""
SELECT
QUANTILE_CONT(foo, 0.25) AS foo_q25,
QUANTILE_CONT(foo, 0.50) AS foo_q50,
QUANTILE_CONT(foo, 0.75) AS foo_q75,
FROM self
""")
# shape: (1, 3)
# ┌─────────┬─────────┬─────────┐
# │ foo_q25 ┆ foo_q50 ┆ foo_q75 │
# │ --- ┆ --- ┆ --- │
# │ f64 ┆ f64 ┆ f64 │
# ╞═════════╪═════════╪═════════╡
# │ 10.0 ┆ 25.0 ┆ 47.5 │
# └─────────┴─────────┴─────────┘
QUANTILE_DISC#
Divides the [0, 1] interval into equal-length subintervals, each corresponding to a value, and returns the value associated with the subinterval where the quantile value falls.
Example:
df = pl.DataFrame({"foo": [5, 20, 10, 30, 70, 40, 10, 90]})
df.sql("""
SELECT
QUANTILE_DISC(foo, 0.25) AS foo_q25,
QUANTILE_DISC(foo, 0.50) AS foo_q50,
QUANTILE_DISC(foo, 0.75) AS foo_q75,
FROM self
""")
# shape: (1, 3)
# ┌─────────┬─────────┬─────────┐
# │ foo_q25 ┆ foo_q50 ┆ foo_q75 │
# │ --- ┆ --- ┆ --- │
# │ f64 ┆ f64 ┆ f64 │
# ╞═════════╪═════════╪═════════╡
# │ 10.0 ┆ 20.0 ┆ 40.0 │
# └─────────┴─────────┴─────────┘
STDDEV#
Returns the sample standard deviation of all the elements in the grouping.
Aliases
STDEV
, STDEV_SAMP
, STDDEV_SAMP
Example:
df = pl.DataFrame(
{
"foo": [10, 20, 8],
"bar": [10, 7, 18],
}
)
df.sql("""
SELECT STDDEV(foo) AS foo_std, STDDEV(bar) AS bar_std FROM self
""")
# shape: (1, 2)
# ┌──────────┬──────────┐
# │ foo_std ┆ bar_std │
# │ --- ┆ --- │
# │ f64 ┆ f64 │
# ╞══════════╪══════════╡
# │ 6.429101 ┆ 5.686241 │
# └──────────┴──────────┘
SUM#
Returns the sum of all the elements in the grouping.
Example:
df = pl.DataFrame(
{
"foo": [1, 2, 3],
"bar": [6, 7, 8],
"ham": ["a", "b", "c"],
}
)
df.sql("""
SELECT SUM(foo) AS foo_sum, SUM(bar) AS bar_sum FROM self
""")
# shape: (1, 2)
# ┌─────────┬─────────┐
# │ foo_sum ┆ bar_sum │
# │ --- ┆ --- │
# │ i64 ┆ i64 │
# ╞═════════╪═════════╡
# │ 6 ┆ 21 │
# └─────────┴─────────┘
VARIANCE#
Returns the variance of all the elements in the grouping.
Aliases
VAR
, VAR_SAMP
Example:
df = pl.DataFrame(
{
"foo": [10, 20, 8],
"bar": [10, 7, 18],
}
)
df.sql("""
SELECT VARIANCE(foo) AS foo_var, VARIANCE(bar) AS bar_var FROM self
""")
# shape: (1, 2)
# ┌───────────┬───────────┐
# │ foo_var ┆ bar_var │
# │ --- ┆ --- │
# │ f64 ┆ f64 │
# ╞═══════════╪═══════════╡
# │ 41.333333 ┆ 32.333333 │
# └───────────┴───────────┘