polars.PartitionParted#
- class polars.PartitionParted(
- base_path: str | Path,
- *,
- file_path: Callable[[KeyedPartitionContext], Path | str | IO[bytes] | IO[str]] | None = None,
- by: str | Expr | Sequence[str | Expr] | Mapping[str, Expr],
- include_key: bool = True,
Partitioning scheme to split parted dataframes.
This is a specialized version of
PartitionByKey
. Where asPartitionByKey
accepts data in any order, this scheme expects the input data to be pre-grouped or pre-sorted. This scheme suffers a lot less overhead thanPartitionByKey
, but may not be always applicable.Each new value of the key expressions starts a new partition, therefore repeating the same value multiple times may overwrite previous partitions.
Warning
This functionality is currently considered unstable. It may be changed at any point without it being considered a breaking change.
- Parameters:
- base_path
The base path for the output files.
Use the
mkdir
option on thesink_*
methods to ensure directories in the path are created.- file_path
A callback to register or modify the output path for each partition relative to the
base_path
.The callback provides apolars.io.partition.KeyedPartitionContext
that contains information about the partition.If no callback is given, it defaults to
{ctx.keys.hive_dirs()}/{ctx.in_part_idx}.{EXT}
.- by
The expressions to partition by.
- include_keybool
Whether to include the key columns in the output files.
Examples
Split a parquet file by a column
year
into CSV files:>>> pl.scan_parquet("/path/to/file.parquet").sink_csv( ... PartitionParted("./out", by="year"), ... mkdir=True, ... )
- __init__(
- base_path: str | Path,
- *,
- file_path: Callable[[KeyedPartitionContext], Path | str | IO[bytes] | IO[str]] | None = None,
- by: str | Expr | Sequence[str | Expr] | Mapping[str, Expr],
- include_key: bool = True,
Methods
__init__
(base_path, *[, file_path, include_key])