polars.LazyFrame.profile#

LazyFrame.profile(
*,
type_coercion: bool = True,
predicate_pushdown: bool = True,
projection_pushdown: bool = True,
simplify_expression: bool = True,
no_optimization: bool = False,
slice_pushdown: bool = True,
comm_subplan_elim: bool = True,
comm_subexpr_elim: bool = True,
cluster_with_columns: bool = True,
collapse_joins: bool = True,
show_plot: bool = False,
truncate_nodes: int = 0,
figsize: tuple[int, int] = (18, 8),
engine: EngineType = 'auto',
optimizations: QueryOptFlags = (),
**_kwargs: Any,
) tuple[DataFrame, DataFrame][source]#

Profile a LazyFrame.

This will run the query and return a tuple containing the materialized DataFrame and a DataFrame that contains profiling information of each node that is executed.

The units of the timings are microseconds.

Parameters:
type_coercion

Do type coercion optimization.

Deprecated since version 1.30.0: Use the optimizations parameters.

predicate_pushdown

Do predicate pushdown optimization.

Deprecated since version 1.30.0: Use the optimizations parameters.

projection_pushdown

Do projection pushdown optimization.

Deprecated since version 1.30.0: Use the optimizations parameters.

simplify_expression

Run simplify expressions optimization.

Deprecated since version 1.30.0: Use the optimizations parameters.

no_optimization

Turn off (certain) optimizations.

Deprecated since version 1.30.0: Use the optimizations parameters.

slice_pushdown

Slice pushdown optimization.

Deprecated since version 1.30.0: Use the optimizations parameters.

comm_subplan_elim

Will try to cache branching subplans that occur on self-joins or unions.

Deprecated since version 1.30.0: Use the optimizations parameters.

comm_subexpr_elim

Common subexpressions will be cached and reused.

Deprecated since version 1.30.0: Use the optimizations parameters.

cluster_with_columns

Combine sequential independent calls to with_columns

Deprecated since version 1.30.0: Use the optimizations parameters.

collapse_joins

Collapse a join and filters into a faster join

Deprecated since version 1.30.0: Use the optimizations parameters.

show_plot

Show a gantt chart of the profiling result

truncate_nodes

Truncate the label lengths in the gantt chart to this number of characters.

figsize

matplotlib figsize of the profiling plot

engine

Select the engine used to process the query (default "auto"):

  • "auto": use the engine set by Config.set_engine_affinity or the POLARS_ENGINE_AFFINITY environment variable, falling back to "in-memory" if unset (this default may change in a future release).

  • "in-memory": use the in-memory engine, this is the default engine.

  • "streaming": use the streaming engine, which processes queries in batches, reducing memory pressure and often outperforming the in-memory engine. This will soon become the default engine of Polars.

  • "gpu": use the CUDA GPU engine (requires an Nvidia GPU and cudf-polars). Pass a GPUEngine object for fine-grained control (e.g. device selection on multi-GPU systems).

If the selected engine cannot run the query, Polars falls back to the in-memory engine.

Note

GPU mode is considered unstable. Not all queries will run successfully on the GPU, however, they should fall back transparently to the default engine if execution is not supported.

Running with POLARS_VERBOSE=1 will provide information if a query falls back (and why).

Note

The GPU engine does not support streaming, if streaming is enabled then GPU execution is switched off.

optimizations

The optimization passes done during query optimization.

Warning

This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.

Examples

>>> lf = pl.LazyFrame(
...     {
...         "a": ["a", "b", "a", "b", "b", "c"],
...         "b": [1, 2, 3, 4, 5, 6],
...         "c": [6, 5, 4, 3, 2, 1],
...     }
... )
>>> lf.group_by("a", maintain_order=True).agg(pl.all().sum()).sort(
...     "a"
... ).profile()  
(shape: (3, 3)
 ┌─────┬─────┬─────┐
 │ a   ┆ b   ┆ c   │
 │ --- ┆ --- ┆ --- │
 │ str ┆ i64 ┆ i64 │
 ╞═════╪═════╪═════╡
 │ a   ┆ 4   ┆ 10  │
 │ b   ┆ 11  ┆ 10  │
 │ c   ┆ 6   ┆ 1   │
 └─────┴─────┴─────┘,
 shape: (3, 3)
 ┌─────────────────────────┬───────┬──────┐
 │ node                    ┆ start ┆ end  │
 │ ---                     ┆ ---   ┆ ---  │
 │ str                     ┆ u64   ┆ u64  │
 ╞═════════════════════════╪═══════╪══════╡
 │ optimization            ┆ 0     ┆ 5    │
 │ group_by_partitioned(a) ┆ 5     ┆ 470  │
 │ sort(a)                 ┆ 475   ┆ 1964 │
 └─────────────────────────┴───────┴──────┘)