polars.LazyFrame.collect_async#

LazyFrame.collect_async(
*,
gevent: bool = False,
engine: EngineType = 'auto',
optimizations: QueryOptFlags = (),
) Awaitable[DataFrame] | _GeventDataFrameResult[DataFrame][source]#

Collect DataFrame asynchronously in thread pool.

Warning

This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.

Collects into a DataFrame (like collect()) but, instead of returning a DataFrame directly, it is scheduled to be collected inside a thread pool, while this method returns almost instantly.

This can be useful if you use gevent or asyncio and want to release control to other greenlets/tasks while LazyFrames are being collected.

Parameters:
gevent

Return wrapper to gevent.event.AsyncResult instead of Awaitable

engine

Select the engine used to process the query (default "auto"):

  • "auto": use the engine set by Config.set_engine_affinity or the POLARS_ENGINE_AFFINITY environment variable, falling back to "in-memory" if unset (this default may change in a future release).

  • "in-memory": use the in-memory engine, this is the default engine.

  • "streaming": use the streaming engine, which processes queries in batches, reducing memory pressure and often outperforming the in-memory engine. This will soon become the default engine of Polars.

  • "gpu": use the CUDA GPU engine (requires an Nvidia GPU and cudf-polars). Pass a GPUEngine object for fine-grained control (e.g. device selection on multi-GPU systems).

If the selected engine cannot run the query, Polars falls back to the in-memory engine.

Note

The GPU engine does not support async, or running in the background. If either are enabled, then GPU execution is switched off.

optimizations

The optimization passes done during query optimization.

Warning

This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.

Returns:
If gevent=False (default) then returns an awaitable.
If gevent=True then returns wrapper that has a
.get(block=True, timeout=None) method.

See also

polars.collect_all

Collect multiple LazyFrames at the same time.

polars.collect_all_async

Collect multiple LazyFrames at the same time lazily.

Notes

In case of error set_exception is used on asyncio.Future/gevent.event.AsyncResult and will be reraised by them.

Examples

>>> import asyncio
>>> lf = pl.LazyFrame(
...     {
...         "a": ["a", "b", "a", "b", "b", "c"],
...         "b": [1, 2, 3, 4, 5, 6],
...         "c": [6, 5, 4, 3, 2, 1],
...     }
... )
>>> async def main():
...     return await (
...         lf.group_by("a", maintain_order=True)
...         .agg(pl.all().sum())
...         .collect_async()
...     )
>>> asyncio.run(main())
shape: (3, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ a   ┆ 4   ┆ 10  │
│ b   ┆ 11  ┆ 10  │
│ c   ┆ 6   ┆ 1   │
└─────┴─────┴─────┘