polars.DataFrame.rolling#
- DataFrame.rolling(
- index_column: IntoExpr,
- *,
- period: str | timedelta,
- offset: str | timedelta | None = None,
- closed: ClosedInterval = 'right',
- group_by: IntoExpr | Iterable[IntoExpr] | None = None,
Create rolling groups based on a temporal or integer column.
Different from a
group_by_dynamic
the windows are now determined by the individual values and are not of constant intervals. For constant intervals useDataFrame.group_by_dynamic()
.If you have a time series
<t_0, t_1, ..., t_n>
, then by default the windows created will be(t_0 - period, t_0]
(t_1 - period, t_1]
…
(t_n - period, t_n]
whereas if you pass a non-default
offset
, then the windows will be(t_0 + offset, t_0 + offset + period]
(t_1 + offset, t_1 + offset + period]
…
(t_n + offset, t_n + offset + period]
The
period
andoffset
arguments are created either from a timedelta, or by using the following string language:1ns (1 nanosecond)
1us (1 microsecond)
1ms (1 millisecond)
1s (1 second)
1m (1 minute)
1h (1 hour)
1d (1 calendar day)
1w (1 calendar week)
1mo (1 calendar month)
1q (1 calendar quarter)
1y (1 calendar year)
1i (1 index count)
Or combine them: “3d12h4m25s” # 3 days, 12 hours, 4 minutes, and 25 seconds
By “calendar day”, we mean the corresponding time on the next day (which may not be 24 hours, due to daylight savings). Similarly for “calendar week”, “calendar month”, “calendar quarter”, and “calendar year”.
- Parameters:
- index_column
Column used to group based on the time window. Often of type Date/Datetime. This column must be sorted in ascending order (or, if
group_by
is specified, then it must be sorted in ascending order within each group).In case of a rolling operation on indices, dtype needs to be one of {UInt32, UInt64, Int32, Int64}. Note that the first three get temporarily cast to Int64, so if performance matters use an Int64 column.
- period
Length of the window - must be non-negative.
- offset
Offset of the window. Default is
-period
.- closed{‘right’, ‘left’, ‘both’, ‘none’}
Define which sides of the temporal interval are closed (inclusive).
- group_by
Also group by this column/these columns
- Returns:
- RollingGroupBy
Object you can call
.agg
on to aggregate by groups, the result of which will be sorted byindex_column
(but note that ifgroup_by
columns are passed, it will only be sorted within each group).
See also
Examples
>>> dates = [ ... "2020-01-01 13:45:48", ... "2020-01-01 16:42:13", ... "2020-01-01 16:45:09", ... "2020-01-02 18:12:48", ... "2020-01-03 19:45:32", ... "2020-01-08 23:16:43", ... ] >>> df = pl.DataFrame({"dt": dates, "a": [3, 7, 5, 9, 2, 1]}).with_columns( ... pl.col("dt").str.strptime(pl.Datetime).set_sorted() ... ) >>> out = df.rolling(index_column="dt", period="2d").agg( ... [ ... pl.sum("a").alias("sum_a"), ... pl.min("a").alias("min_a"), ... pl.max("a").alias("max_a"), ... ] ... ) >>> assert out["sum_a"].to_list() == [3, 10, 15, 24, 11, 1] >>> assert out["max_a"].to_list() == [3, 7, 7, 9, 9, 1] >>> assert out["min_a"].to_list() == [3, 3, 3, 3, 2, 1] >>> out shape: (6, 4) ┌─────────────────────┬───────┬───────┬───────┐ │ dt ┆ sum_a ┆ min_a ┆ max_a │ │ --- ┆ --- ┆ --- ┆ --- │ │ datetime[μs] ┆ i64 ┆ i64 ┆ i64 │ ╞═════════════════════╪═══════╪═══════╪═══════╡ │ 2020-01-01 13:45:48 ┆ 3 ┆ 3 ┆ 3 │ │ 2020-01-01 16:42:13 ┆ 10 ┆ 3 ┆ 7 │ │ 2020-01-01 16:45:09 ┆ 15 ┆ 3 ┆ 7 │ │ 2020-01-02 18:12:48 ┆ 24 ┆ 3 ┆ 9 │ │ 2020-01-03 19:45:32 ┆ 11 ┆ 2 ┆ 9 │ │ 2020-01-08 23:16:43 ┆ 1 ┆ 1 ┆ 1 │ └─────────────────────┴───────┴───────┴───────┘
If you use an index count in
period
oroffset
, then it’s based on the values inindex_column
:>>> df = pl.DataFrame({"int": [0, 4, 5, 6, 8], "value": [1, 4, 2, 4, 1]}) >>> df.rolling("int", period="3i").agg(pl.col("int").alias("aggregated")) shape: (5, 2) ┌─────┬────────────┐ │ int ┆ aggregated │ │ --- ┆ --- │ │ i64 ┆ list[i64] │ ╞═════╪════════════╡ │ 0 ┆ [0] │ │ 4 ┆ [4] │ │ 5 ┆ [4, 5] │ │ 6 ┆ [4, 5, 6] │ │ 8 ┆ [6, 8] │ └─────┴────────────┘
If you want the index count to be based on row number, then you may want to combine
rolling
withwith_row_index()
.