polars.LazyFrame.collect_async#
- LazyFrame.collect_async(
- *,
- gevent: bool = False,
- type_coercion: bool = True,
- predicate_pushdown: bool = True,
- projection_pushdown: bool = True,
- simplify_expression: bool = True,
- no_optimization: bool = False,
- slice_pushdown: bool = True,
- comm_subplan_elim: bool = True,
- comm_subexpr_elim: bool = True,
- cluster_with_columns: bool = True,
- streaming: bool = False,
Collect DataFrame asynchronously in thread pool.
Warning
This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.
Collects into a DataFrame (like
collect()
) but, instead of returning a DataFrame directly, it is scheduled to be collected inside a thread pool, while this method returns almost instantly.This can be useful if you use
gevent
orasyncio
and want to release control to other greenlets/tasks while LazyFrames are being collected.- Parameters:
- gevent
Return wrapper to
gevent.event.AsyncResult
instead of Awaitable- type_coercion
Do type coercion optimization.
- predicate_pushdown
Do predicate pushdown optimization.
- projection_pushdown
Do projection pushdown optimization.
- simplify_expression
Run simplify expressions optimization.
- no_optimization
Turn off (certain) optimizations.
- slice_pushdown
Slice pushdown optimization.
- comm_subplan_elim
Will try to cache branching subplans that occur on self-joins or unions.
- comm_subexpr_elim
Common subexpressions will be cached and reused.
- cluster_with_columns
Combine sequential independent calls to with_columns
- streaming
Process the query in batches to handle larger-than-memory data. If set to
False
(default), the entire query is processed in a single batch.Warning
Streaming mode is considered unstable. It may be changed at any point without it being considered a breaking change.
Note
Use
explain()
to see if Polars can process the query in streaming mode.
- Returns:
- If
gevent=False
(default) then returns an awaitable. - If
gevent=True
then returns wrapper that has a .get(block=True, timeout=None)
method.
- If
See also
polars.collect_all
Collect multiple LazyFrames at the same time.
polars.collect_all_async
Collect multiple LazyFrames at the same time lazily.
Notes
In case of error
set_exception
is used onasyncio.Future
/gevent.event.AsyncResult
and will be reraised by them.Examples
>>> import asyncio >>> lf = pl.LazyFrame( ... { ... "a": ["a", "b", "a", "b", "b", "c"], ... "b": [1, 2, 3, 4, 5, 6], ... "c": [6, 5, 4, 3, 2, 1], ... } ... ) >>> async def main(): ... return await ( ... lf.group_by("a", maintain_order=True) ... .agg(pl.all().sum()) ... .collect_async() ... ) >>> asyncio.run(main()) shape: (3, 3) ┌─────┬─────┬─────┐ │ a ┆ b ┆ c │ │ --- ┆ --- ┆ --- │ │ str ┆ i64 ┆ i64 │ ╞═════╪═════╪═════╡ │ a ┆ 4 ┆ 10 │ │ b ┆ 11 ┆ 10 │ │ c ┆ 6 ┆ 1 │ └─────┴─────┴─────┘