polars.PartitionMaxSize#
- class polars.PartitionMaxSize(
- base_path: str | Path,
- *,
- file_path: Callable[[BasePartitionContext], Path | str | IO[bytes] | IO[str]] | None = None,
- max_size: int,
Partitioning scheme to write files with a maximum size.
This partitioning scheme generates files that have a given maximum size. If the size reaches the maximum size, it is closed and a new file is opened.
Warning
This functionality is currently considered unstable. It may be changed at any point without it being considered a breaking change.
- Parameters:
- base_path
The base path for the output files.
- file_path
A callback to register or modify the output path for each partition relative to the
base_path
. The callback provides apolars.io.partition.BasePartitionContext
that contains information about the partition.If no callback is given, it defaults to
{ctx.file_idx}.{EXT}
.- max_sizeint
The maximum size in rows of each of the generated files.
Examples
Split a parquet file by over smaller CSV files with 100 000 rows each:
>>> pl.scan_parquet("/path/to/file.parquet").sink_csv( ... PartitionMax("./out", max_size=100_000), ... )
- __init__(
- base_path: str | Path,
- *,
- file_path: Callable[[BasePartitionContext], Path | str | IO[bytes] | IO[str]] | None = None,
- max_size: int,
Methods
__init__
(base_path, *[, file_path])