polars.LazyFrame.sink_batches#
- LazyFrame.sink_batches(
- function: Callable[[DataFrame], bool | None],
- *,
- chunk_size: int | None = None,
- maintain_order: bool = True,
- lazy: bool = False,
- engine: EngineType = 'auto',
- optimizations: QueryOptFlags = (),
Evaluate the query and call a user-defined function for every ready batch.
This allows streaming results that are larger than RAM in certain cases.
Warning
This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.
Warning
This method is much slower than native sinks. Only use it if you cannot implement your logic otherwise.
- Parameters:
- function
Function to run with a batch that is ready. If the function returns
True, this signals that no more results are needed, allowing for early stopping.- chunk_size
The number of rows that are buffered before the callback is called.
- maintain_order
Maintain the order in which data is processed. Setting this to
Falsewill be slightly faster.- lazy: bool
Wait to start execution until
collectis called.- engine
Select the engine used to process the query (default
"auto"):"auto": use the engine set byConfig.set_engine_affinityor thePOLARS_ENGINE_AFFINITYenvironment variable, falling back to"streaming"if unset."in-memory": use the in-memory engine before writing, this is the default engine."streaming": use the streaming engine, which processes queries in batches, reducing memory pressure and often outperforming the in-memory engine. This will soon become the default engine of Polars."gpu": use the CUDA GPU engine (requires an Nvidia GPU andcudf-polars). Pass aGPUEngineobject for fine-grained control.
If the selected engine cannot run the query, Polars falls back to the streaming engine.
- optimizations
The optimization passes done during query optimization.
This has no effect if
lazyis set toTrue.
Examples
>>> lf = pl.scan_csv("/path/to/my_larger_than_ram_file.csv") >>> lf.sink_batches(lambda df: print(df))