polars.LazyFrame.fetch#

LazyFrame.fetch(
n_rows: int = 500,
*,
type_coercion: bool = True,
predicate_pushdown: bool = True,
projection_pushdown: bool = True,
simplify_expression: bool = True,
no_optimization: bool = False,
slice_pushdown: bool = True,
comm_subplan_elim: bool = True,
comm_subexpr_elim: bool = True,
streaming: bool = False,
) DataFrame[source]#

Collect a small number of rows for debugging purposes.

Fetch is like a collect() operation, but it overwrites the number of rows read by every scan operation. This is a utility that helps debug a query on a smaller number of rows.

Note that the fetch does not guarantee the final number of rows in the DataFrame. Filter, join operations and a lower number of rows available in the scanned file influence the final number of rows.

Parameters:
n_rows

Collect n_rows from the data sources.

type_coercion

Run type coercion optimization.

predicate_pushdown

Run predicate pushdown optimization.

projection_pushdown

Run projection pushdown optimization.

simplify_expression

Run simplify expressions optimization.

no_optimization

Turn off optimizations.

slice_pushdown

Slice pushdown optimization

comm_subplan_elim

Will try to cache branching subplans that occur on self-joins or unions.

comm_subexpr_elim

Common subexpressions will be cached and reused.

streaming

Run parts of the query in a streaming fashion (this is in an alpha state)

Returns:
DataFrame

Examples

>>> lf = pl.LazyFrame(
...     {
...         "a": ["a", "b", "a", "b", "b", "c"],
...         "b": [1, 2, 3, 4, 5, 6],
...         "c": [6, 5, 4, 3, 2, 1],
...     }
... )
>>> lf.groupby("a", maintain_order=True).agg(pl.all().sum()).fetch(2)
shape: (2, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ a   ┆ 1   ┆ 6   │
│ b   ┆ 2   ┆ 5   │
└─────┴─────┴─────┘