polars.PartitionMaxSize#

class polars.PartitionMaxSize(
base_path: str | Path,
*,
file_path: Callable[[BasePartitionContext], Path | str | IO[bytes] | IO[str]] | None = None,
max_size: int,
)[source]#

Partitioning scheme to write files with a maximum size.

This partitioning scheme generates files that have a given maximum size. If the size reaches the maximum size, it is closed and a new file is opened.

Warning

This functionality is currently considered unstable. It may be changed at any point without it being considered a breaking change.

Parameters:
base_path

The base path for the output files.

file_path

A callback to register or modify the output path for each partition relative to the base_path. The callback provides a polars.io.partition.BasePartitionContext that contains information about the partition.

If no callback is given, it defaults to {ctx.file_idx}.{EXT}.

max_sizeint

The maximum size in rows of each of the generated files.

Examples

Split a parquet file by over smaller CSV files with 100 000 rows each:

>>> pl.scan_parquet("/path/to/file.parquet").sink_csv(
...     PartitionMax("./out", max_size=100_000),
... )  
__init__(
base_path: str | Path,
*,
file_path: Callable[[BasePartitionContext], Path | str | IO[bytes] | IO[str]] | None = None,
max_size: int,
) None[source]#

Methods

__init__(base_path, *[, file_path])