polars.LazyFrame.sink_parquet#
- LazyFrame.sink_parquet(
- path: str | Path,
- *,
- compression: str = 'zstd',
- compression_level: int | None = None,
- statistics: bool = False,
- row_group_size: int | None = None,
- data_pagesize_limit: int | None = None,
- maintain_order: bool = True,
- type_coercion: bool = True,
- predicate_pushdown: bool = True,
- projection_pushdown: bool = True,
- simplify_expression: bool = True,
- slice_pushdown: bool = True,
- no_optimization: bool = False,
- Evaluate the query in streaming mode and write to a Parquet file. - This allows streaming results that are larger than RAM to be written to disk. - Parameters:
- path
- File path to which the file should be written. 
- compression{‘lz4’, ‘uncompressed’, ‘snappy’, ‘gzip’, ‘lzo’, ‘brotli’, ‘zstd’}
- Choose “zstd” for good compression performance. Choose “lz4” for fast compression/decompression. Choose “snappy” for more backwards compatibility guarantees when you deal with older parquet readers. 
- compression_level
- The level of compression to use. Higher compression means smaller files on disk. - “gzip” : min-level: 0, max-level: 10. 
- “brotli” : min-level: 0, max-level: 11. 
- “zstd” : min-level: 1, max-level: 22. 
 
- statistics
- Write statistics to the parquet headers. This requires extra compute. 
- row_group_size
- Size of the row groups in number of rows. If None (default), the chunks of the - DataFrameare used. Writing in smaller chunks may reduce memory pressure and improve writing speeds.
- data_pagesize_limit
- Size limit of individual data pages. If not set defaults to 1024 * 1024 bytes 
- maintain_order
- Maintain the order in which data is processed. Setting this to - Falsewill be slightly faster.
- type_coercion
- Do type coercion optimization. 
- predicate_pushdown
- Do predicate pushdown optimization. 
- projection_pushdown
- Do projection pushdown optimization. 
- simplify_expression
- Run simplify expressions optimization. 
- slice_pushdown
- Slice pushdown optimization. 
- no_optimization
- Turn off (certain) optimizations. 
 
- Returns:
- DataFrame
 
 - Examples - >>> lf = pl.scan_csv("/path/to/my_larger_than_ram_file.csv") >>> lf.sink_parquet("out.parquet")