polars.DataFrame.iter_slices#
- DataFrame.iter_slices(
- n_rows: int = 10000,
Returns a non-copying iterator of slices over the underlying DataFrame.
- Parameters:
- n_rows
Determines the number of rows contained in each DataFrame slice.
See also
iter_rows
Row iterator over frame data (does not materialise all rows).
partition_by
Split into multiple DataFrames, partitioned by groups.
Examples
>>> from datetime import date >>> df = pl.DataFrame( ... data={ ... "a": range(17_500), ... "b": date(2023, 1, 1), ... "c": "klmnoopqrstuvwxyz", ... }, ... schema_overrides={"a": pl.Int32}, ... ) >>> for idx, frame in enumerate(df.iter_slices()): ... print(f"{type(frame).__name__}:[{idx}]:{len(frame)}") ... DataFrame:[0]:10000 DataFrame:[1]:7500
Using
iter_slices
is an efficient way to chunk-iterate over DataFrames and any supported frame export/conversion types; for example, as RecordBatches:>>> for frame in df.iter_slices(n_rows=15_000): ... record_batch = frame.to_arrow().to_batches()[0] ... print(record_batch, "\n<< ", len(record_batch)) ... pyarrow.RecordBatch a: int32 b: date32[day] c: large_string << 15000 pyarrow.RecordBatch a: int32 b: date32[day] c: large_string << 2500