polars.LazyFrame.collect#
- LazyFrame.collect(
- *,
- type_coercion: bool = True,
- predicate_pushdown: bool = True,
- projection_pushdown: bool = True,
- simplify_expression: bool = True,
- slice_pushdown: bool = True,
- comm_subplan_elim: bool = True,
- comm_subexpr_elim: bool = True,
- cluster_with_columns: bool = True,
- collapse_joins: bool = True,
- no_optimization: bool = False,
- engine: EngineType = 'auto',
- background: bool = False,
- optimizations: QueryOptFlags = (),
- **_kwargs: Any,
Materialize this
LazyFrameinto aDataFrame.By default, all query optimizations are enabled. Individual optimizations may be disabled by setting the corresponding parameter to
False.- Parameters:
- type_coercion
Do type coercion optimization.
Deprecated since version 1.30.0: Use the
optimizationsparameters.- predicate_pushdown
Do predicate pushdown optimization.
Deprecated since version 1.30.0: Use the
optimizationsparameters.- projection_pushdown
Do projection pushdown optimization.
Deprecated since version 1.30.0: Use the
optimizationsparameters.- simplify_expression
Run simplify expressions optimization.
Deprecated since version 1.30.0: Use the
optimizationsparameters.- slice_pushdown
Slice pushdown optimization.
Deprecated since version 1.30.0: Use the
optimizationsparameters.- comm_subplan_elim
Will try to cache branching subplans that occur on self-joins or unions.
Deprecated since version 1.30.0: Use the
optimizationsparameters.- comm_subexpr_elim
Common subexpressions will be cached and reused.
Deprecated since version 1.30.0: Use the
optimizationsparameters.- cluster_with_columns
Combine sequential independent calls to with_columns
Deprecated since version 1.30.0: Use the
optimizationsparameters.- collapse_joins
Collapse a join and filters into a faster join
Deprecated since version 1.30.0: Use the
optimizationsparameters.- no_optimization
Turn off (certain) optimizations.
Deprecated since version 1.30.0: Use the
optimizationsparameters.- engine
Select the engine used to process the query (default
"auto"):"auto": use the engine set byConfig.set_engine_affinityor thePOLARS_ENGINE_AFFINITYenvironment variable, falling back to"in-memory"if unset (this default may change in a future release)."in-memory": use the in-memory engine, this is the default engine."streaming": use the streaming engine, which processes queries in batches, reducing memory pressure and often outperforming the in-memory engine. This will soon become the default engine of Polars."gpu": use the CUDA GPU engine (requires an Nvidia GPU andcudf-polars). Pass aGPUEngineobject for fine-grained control (e.g. device selection on multi-GPU systems).
If the selected engine cannot run the query, Polars falls back to the in-memory engine.
Note
GPU mode is considered unstable. Not all queries will run successfully on the GPU, however, they should fall back transparently to the default engine if execution is not supported.
Running with
POLARS_VERBOSE=1will provide information if a query falls back (and why).Note
The GPU engine does not support streaming, or running in the background. If either are enabled, then GPU execution is switched off.
- background
Run the query in the background and get a handle to the query. This handle can be used to fetch the result or cancel the query.
Warning
Background mode is considered unstable. It may be changed at any point without it being considered a breaking change.
- optimizations
The optimization passes done during query optimization.
Warning
This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.
- Returns:
- DataFrame
See also
explainPrint the query plan that is evaluated with collect.
profileCollect the LazyFrame and time each node in the computation graph.
polars.collect_allCollect multiple LazyFrames at the same time.
polars.Config.set_streaming_chunk_sizeSet the size of streaming batches.
Examples
>>> lf = pl.LazyFrame( ... { ... "a": ["a", "b", "a", "b", "b", "c"], ... "b": [1, 2, 3, 4, 5, 6], ... "c": [6, 5, 4, 3, 2, 1], ... } ... ) >>> lf.group_by("a").agg(pl.all().sum()).collect() shape: (3, 3) ┌─────┬─────┬─────┐ │ a ┆ b ┆ c │ │ --- ┆ --- ┆ --- │ │ str ┆ i64 ┆ i64 │ ╞═════╪═════╪═════╡ │ a ┆ 4 ┆ 10 │ │ b ┆ 11 ┆ 10 │ │ c ┆ 6 ┆ 1 │ └─────┴─────┴─────┘
Collect in streaming mode
>>> lf.group_by("a").agg(pl.all().sum()).collect( ... engine="streaming" ... ) shape: (3, 3) ┌─────┬─────┬─────┐ │ a ┆ b ┆ c │ │ --- ┆ --- ┆ --- │ │ str ┆ i64 ┆ i64 │ ╞═════╪═════╪═════╡ │ a ┆ 4 ┆ 10 │ │ b ┆ 11 ┆ 10 │ │ c ┆ 6 ┆ 1 │ └─────┴─────┴─────┘
Collect in GPU mode
>>> lf.group_by("a").agg(pl.all().sum()).collect(engine="gpu") shape: (3, 3) ┌─────┬─────┬─────┐ │ a ┆ b ┆ c │ │ --- ┆ --- ┆ --- │ │ str ┆ i64 ┆ i64 │ ╞═════╪═════╪═════╡ │ b ┆ 11 ┆ 10 │ │ a ┆ 4 ┆ 10 │ │ c ┆ 6 ┆ 1 │ └─────┴─────┴─────┘
With control over the device used
>>> lf.group_by("a").agg(pl.all().sum()).collect( ... engine=pl.GPUEngine(device=1) ... ) shape: (3, 3) ┌─────┬─────┬─────┐ │ a ┆ b ┆ c │ │ --- ┆ --- ┆ --- │ │ str ┆ i64 ┆ i64 │ ╞═════╪═════╪═════╡ │ b ┆ 11 ┆ 10 │ │ a ┆ 4 ┆ 10 │ │ c ┆ 6 ┆ 1 │ └─────┴─────┴─────┘