polars.LazyFrame.sink_parquet#
- LazyFrame.sink_parquet(
- path: str | Path,
- *,
- compression: str = 'zstd',
- compression_level: int | None = None,
- statistics: bool = False,
- row_group_size: int | None = None,
- data_pagesize_limit: int | None = None,
- maintain_order: bool = True,
- type_coercion: bool = True,
- predicate_pushdown: bool = True,
- projection_pushdown: bool = True,
- simplify_expression: bool = True,
- no_optimization: bool = False,
- slice_pushdown: bool = True,
Persists a LazyFrame at the provided path.
This allows streaming results that are larger than RAM to be written to disk.
- Parameters:
- path
File path to which the file should be written.
- compression{‘lz4’, ‘uncompressed’, ‘snappy’, ‘gzip’, ‘lzo’, ‘brotli’, ‘zstd’}
Choose “zstd” for good compression performance. Choose “lz4” for fast compression/decompression. Choose “snappy” for more backwards compatibility guarantees when you deal with older parquet readers.
- compression_level
The level of compression to use. Higher compression means smaller files on disk.
“gzip” : min-level: 0, max-level: 10.
“brotli” : min-level: 0, max-level: 11.
“zstd” : min-level: 1, max-level: 22.
- statistics
Write statistics to the parquet headers. This requires extra compute.
- row_group_size
Size of the row groups in number of rows. If None (default), the chunks of the DataFrame are used. Writing in smaller chunks may reduce memory pressure and improve writing speeds. If None and
use_pyarrow=True
, the row group size will be the minimum of the DataFrame size and 64 * 1024 * 1024.- data_pagesize_limit
Size limit of individual data pages. If not set defaults to 1024 * 1024 bytes
- maintain_order
Maintain the order in which data is processed. Setting this to False will be slightly faster.
- type_coercion
Do type coercion optimization.
- predicate_pushdown
Do predicate pushdown optimization.
- projection_pushdown
Do projection pushdown optimization.
- simplify_expression
Run simplify expressions optimization.
- no_optimization
Turn off (certain) optimizations.
- slice_pushdown
Slice pushdown optimization.
- Returns:
- DataFrame
Examples
>>> lf = pl.scan_csv("/path/to/my_larger_than_ram_file.csv") >>> lf.sink_parquet("out.parquet")