polars.LazyFrame.profile#

LazyFrame.profile(

*,

type_coercion: bool = True,

_type_check: bool = True,

predicate_pushdown: bool = True,

projection_pushdown: bool = True,

simplify_expression: bool = True,

no_optimization: bool = False,

slice_pushdown: bool = True,

comm_subplan_elim: bool = True,

comm_subexpr_elim: bool = True,

cluster_with_columns: bool = True,

collapse_joins: bool = True,

show_plot: bool = False,

truncate_nodes: int = 0,

figsize: tuple[int, int] = (18, 8),

engine: EngineType = 'auto',

_check_order: bool = True,

**_kwargs: Any,

) → tuple[DataFrame, DataFrame][source]#

Profile a LazyFrame.

This will run the query and return a tuple containing the materialized DataFrame and a DataFrame that contains profiling information of each node that is executed.

The units of the timings are microseconds.

Parameters:

type_coercion: Do type coercion optimization.
predicate_pushdown: Do predicate pushdown optimization.
projection_pushdown: Do projection pushdown optimization.
simplify_expression: Run simplify expressions optimization.
no_optimization: Turn off (certain) optimizations.
slice_pushdown: Slice pushdown optimization.
comm_subplan_elim: Will try to cache branching subplans that occur on self-joins or unions.
comm_subexpr_elim: Common subexpressions will be cached and reused.
cluster_with_columns: Combine sequential independent calls to with_columns
collapse_joins: Collapse a join and filters into a faster join
show_plot: Show a gantt chart of the profiling result
truncate_nodes: Truncate the label lengths in the gantt chart to this number of characters.
figsize: matplotlib figsize of the profiling plot
engine: Select the engine used to process the query, optional. At the moment, if set to "auto" (default), the query is run using the polars in-memory engine. Polars will also attempt to use the engine set by the POLARS_ENGINE_AFFINITY environment variable. If it cannot run the query using the selected engine, the query is run using the polars in-memory engine. If set to "gpu", the GPU engine is used. Fine-grained control over the GPU engine, for example which device to use on a system with multiple devices, is possible by providing a GPUEngine object with configuration options.

Note

GPU mode is considered unstable. Not all queries will run successfully on the GPU, however, they should fall back transparently to the default engine if execution is not supported.

Running with POLARS_VERBOSE=1 will provide information if a query falls back (and why).

Note

The GPU engine does not support streaming, if streaming is enabled then GPU execution is switched off.

Examples

>>> lf = pl.LazyFrame(
...     {
...         "a": ["a", "b", "a", "b", "b", "c"],
...         "b": [1, 2, 3, 4, 5, 6],
...         "c": [6, 5, 4, 3, 2, 1],
...     }
... )
>>> lf.group_by("a", maintain_order=True).agg(pl.all().sum()).sort(
...     "a"
... ).profile()  
(shape: (3, 3)
 ┌─────┬─────┬─────┐
 │ a   ┆ b   ┆ c   │
 │ --- ┆ --- ┆ --- │
 │ str ┆ i64 ┆ i64 │
 ╞═════╪═════╪═════╡
 │ a   ┆ 4   ┆ 10  │
 │ b   ┆ 11  ┆ 10  │
 │ c   ┆ 6   ┆ 1   │
 └─────┴─────┴─────┘,
 shape: (3, 3)
 ┌─────────────────────────┬───────┬──────┐
 │ node                    ┆ start ┆ end  │
 │ ---                     ┆ ---   ┆ ---  │
 │ str                     ┆ u64   ┆ u64  │
 ╞═════════════════════════╪═══════╪══════╡
 │ optimization            ┆ 0     ┆ 5    │
 │ group_by_partitioned(a) ┆ 5     ┆ 470  │
 │ sort(a)                 ┆ 475   ┆ 1964 │
 └─────────────────────────┴───────┴──────┘)