polars.LazyFrame.collect#
- LazyFrame.collect(
- *,
- type_coercion: bool = True,
- predicate_pushdown: bool = True,
- projection_pushdown: bool = True,
- simplify_expression: bool = True,
- slice_pushdown: bool = True,
- comm_subplan_elim: bool = True,
- comm_subexpr_elim: bool = True,
- no_optimization: bool = False,
- streaming: bool = False,
- _eager: bool = False,
Materialize this LazyFrame into a DataFrame.
By default, all query optimizations are enabled. Individual optimizations may be disabled by setting the corresponding parameter to
False.- Parameters:
- type_coercion
Do type coercion optimization.
- predicate_pushdown
Do predicate pushdown optimization.
- projection_pushdown
Do projection pushdown optimization.
- simplify_expression
Run simplify expressions optimization.
- slice_pushdown
Slice pushdown optimization.
- comm_subplan_elim
Will try to cache branching subplans that occur on self-joins or unions.
- comm_subexpr_elim
Common subexpressions will be cached and reused.
- no_optimization
Turn off (certain) optimizations.
- streaming
Process the query in batches to handle larger-than-memory data. If set to
False(default), the entire query is processed in a single batch.Warning
This functionality is currently in an alpha state.
Note
Use
explain()to see if Polars can process the query in streaming mode.
- Returns:
- DataFrame
See also
fetchRun the query on the first
nrows only for debugging purposes.explainPrint the query plan that is evaluated with collect.
profileCollect the LazyFrame and time each node in the computation graph.
polars.collect_allCollect multiple LazyFrames at the same time.
polars.Config.set_streaming_chunk_sizeSet the size of streaming batches.
Examples
>>> lf = pl.LazyFrame( ... { ... "a": ["a", "b", "a", "b", "b", "c"], ... "b": [1, 2, 3, 4, 5, 6], ... "c": [6, 5, 4, 3, 2, 1], ... } ... ) >>> lf.group_by("a").agg(pl.all().sum()).collect() shape: (3, 3) ┌─────┬─────┬─────┐ │ a ┆ b ┆ c │ │ --- ┆ --- ┆ --- │ │ str ┆ i64 ┆ i64 │ ╞═════╪═════╪═════╡ │ a ┆ 4 ┆ 10 │ │ b ┆ 11 ┆ 10 │ │ c ┆ 6 ┆ 1 │ └─────┴─────┴─────┘
Collect in streaming mode
>>> lf.group_by("a").agg(pl.all().sum()).collect( ... streaming=True ... ) shape: (3, 3) ┌─────┬─────┬─────┐ │ a ┆ b ┆ c │ │ --- ┆ --- ┆ --- │ │ str ┆ i64 ┆ i64 │ ╞═════╪═════╪═════╡ │ a ┆ 4 ┆ 10 │ │ b ┆ 11 ┆ 10 │ │ c ┆ 6 ┆ 1 │ └─────┴─────┴─────┘