pub struct LazyFrame {
pub logical_plan: DslPlan,
/* private fields */
}
lazy
only.Expand description
Lazy abstraction over an eager DataFrame
.
It really is an abstraction over a logical plan. The methods of this struct will incrementally
modify a logical plan until output is requested (via collect
).
Fields§
§logical_plan: DslPlan
Implementations§
Source§impl LazyFrame
impl LazyFrame
pub fn set_cached_arena(&self, lp_arena: Arena<IR>, expr_arena: Arena<AExpr>)
pub fn schema_with_arenas( &mut self, lp_arena: &mut Arena<IR>, expr_arena: &mut Arena<AExpr>, ) -> Result<Arc<Schema<DataType>>, PolarsError>
Sourcepub fn collect_schema(&mut self) -> Result<Arc<Schema<DataType>>, PolarsError>
pub fn collect_schema(&mut self) -> Result<Arc<Schema<DataType>>, PolarsError>
Get a handle to the schema — a map from column names to data types — of the current
LazyFrame
computation.
Returns an Err
if the logical plan has already encountered an error (i.e., if
self.collect()
would fail), Ok
otherwise.
Source§impl LazyFrame
impl LazyFrame
pub fn collect_concurrently(self) -> Result<InProcessQuery, PolarsError>
Source§impl LazyFrame
impl LazyFrame
Sourcepub fn get_current_optimizations(&self) -> OptFlags
pub fn get_current_optimizations(&self) -> OptFlags
Get current optimizations.
Sourcepub fn with_optimizations(self, opt_state: OptFlags) -> LazyFrame
pub fn with_optimizations(self, opt_state: OptFlags) -> LazyFrame
Set allowed optimizations.
Sourcepub fn without_optimizations(self) -> LazyFrame
pub fn without_optimizations(self) -> LazyFrame
Turn off all optimizations.
Sourcepub fn with_projection_pushdown(self, toggle: bool) -> LazyFrame
pub fn with_projection_pushdown(self, toggle: bool) -> LazyFrame
Toggle projection pushdown optimization.
Sourcepub fn with_cluster_with_columns(self, toggle: bool) -> LazyFrame
pub fn with_cluster_with_columns(self, toggle: bool) -> LazyFrame
Toggle cluster with columns optimization.
Sourcepub fn with_collapse_joins(self, toggle: bool) -> LazyFrame
pub fn with_collapse_joins(self, toggle: bool) -> LazyFrame
Toggle collapse joins optimization.
Sourcepub fn with_check_order(self, toggle: bool) -> LazyFrame
pub fn with_check_order(self, toggle: bool) -> LazyFrame
Check if operations are order dependent and unset maintaining_order if the order would not be observed.
Sourcepub fn with_predicate_pushdown(self, toggle: bool) -> LazyFrame
pub fn with_predicate_pushdown(self, toggle: bool) -> LazyFrame
Toggle predicate pushdown optimization.
Sourcepub fn with_type_coercion(self, toggle: bool) -> LazyFrame
pub fn with_type_coercion(self, toggle: bool) -> LazyFrame
Toggle type coercion optimization.
Sourcepub fn with_type_check(self, toggle: bool) -> LazyFrame
pub fn with_type_check(self, toggle: bool) -> LazyFrame
Toggle type check optimization.
Sourcepub fn with_simplify_expr(self, toggle: bool) -> LazyFrame
pub fn with_simplify_expr(self, toggle: bool) -> LazyFrame
Toggle expression simplification optimization on or off.
Sourcepub fn with_slice_pushdown(self, toggle: bool) -> LazyFrame
pub fn with_slice_pushdown(self, toggle: bool) -> LazyFrame
Toggle slice pushdown optimization.
Sourcepub fn with_row_estimate(self, toggle: bool) -> LazyFrame
pub fn with_row_estimate(self, toggle: bool) -> LazyFrame
Try to estimate the number of rows so that joins can determine which side to keep in memory.
Sourcepub fn _with_eager(self, toggle: bool) -> LazyFrame
pub fn _with_eager(self, toggle: bool) -> LazyFrame
Run every node eagerly. This turns off multi-node optimizations.
Sourcepub fn describe_plan(&self) -> Result<String, PolarsError>
pub fn describe_plan(&self) -> Result<String, PolarsError>
Return a String describing the naive (un-optimized) logical plan.
Sourcepub fn describe_plan_tree(&self) -> Result<String, PolarsError>
pub fn describe_plan_tree(&self) -> Result<String, PolarsError>
Return a String describing the naive (un-optimized) logical plan in tree format.
Sourcepub fn describe_optimized_plan(&self) -> Result<String, PolarsError>
pub fn describe_optimized_plan(&self) -> Result<String, PolarsError>
Return a String describing the optimized logical plan.
Returns Err
if optimizing the logical plan fails.
Sourcepub fn describe_optimized_plan_tree(&self) -> Result<String, PolarsError>
pub fn describe_optimized_plan_tree(&self) -> Result<String, PolarsError>
Return a String describing the optimized logical plan in tree format.
Returns Err
if optimizing the logical plan fails.
Sourcepub fn explain(&self, optimized: bool) -> Result<String, PolarsError>
pub fn explain(&self, optimized: bool) -> Result<String, PolarsError>
Return a String describing the logical plan.
If optimized
is true
, explains the optimized plan. If optimized
is false
,
explains the naive, un-optimized plan.
Sourcepub fn sort(
self,
by: impl IntoVec<PlSmallStr>,
sort_options: SortMultipleOptions,
) -> LazyFrame
pub fn sort( self, by: impl IntoVec<PlSmallStr>, sort_options: SortMultipleOptions, ) -> LazyFrame
Add a sort operation to the logical plan.
Sorts the LazyFrame by the column name specified using the provided options.
§Example
Sort DataFrame by ‘sepal_width’ column:
fn sort_by_a(df: DataFrame) -> LazyFrame {
df.lazy().sort(["sepal_width"], Default::default())
}
Sort by a single column with specific order:
fn sort_with_specific_order(df: DataFrame, descending: bool) -> LazyFrame {
df.lazy().sort(
["sepal_width"],
SortMultipleOptions::new()
.with_order_descending(descending)
)
}
Sort by multiple columns with specifying order for each column:
fn sort_by_multiple_columns_with_specific_order(df: DataFrame) -> LazyFrame {
df.lazy().sort(
["sepal_width", "sepal_length"],
SortMultipleOptions::new()
.with_order_descending_multi([false, true])
)
}
See SortMultipleOptions
for more options.
Sourcepub fn sort_by_exprs<E>(
self,
by_exprs: E,
sort_options: SortMultipleOptions,
) -> LazyFrame
pub fn sort_by_exprs<E>( self, by_exprs: E, sort_options: SortMultipleOptions, ) -> LazyFrame
Add a sort operation to the logical plan.
Sorts the LazyFrame by the provided list of expressions, which will be turned into concrete columns before sorting.
See SortMultipleOptions
for more options.
§Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;
/// Sort DataFrame by 'sepal_width' column
fn example(df: DataFrame) -> LazyFrame {
df.lazy()
.sort_by_exprs(vec![col("sepal_width")], Default::default())
}
pub fn top_k<E>( self, k: u32, by_exprs: E, sort_options: SortMultipleOptions, ) -> LazyFrame
pub fn bottom_k<E>( self, k: u32, by_exprs: E, sort_options: SortMultipleOptions, ) -> LazyFrame
Sourcepub fn reverse(self) -> LazyFrame
pub fn reverse(self) -> LazyFrame
Reverse the DataFrame
from top to bottom.
Row i
becomes row number_of_rows - i - 1
.
§Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;
fn example(df: DataFrame) -> LazyFrame {
df.lazy()
.reverse()
}
Sourcepub fn rename<I, J, T, S>(self, existing: I, new: J, strict: bool) -> LazyFrame
pub fn rename<I, J, T, S>(self, existing: I, new: J, strict: bool) -> LazyFrame
Rename columns in the DataFrame.
existing
and new
are iterables of the same length containing the old and
corresponding new column names. Renaming happens to all existing
columns
simultaneously, not iteratively. If strict
is true, all columns in existing
must be present in the LazyFrame
when rename
is called; otherwise, only
those columns that are actually found will be renamed (others will be ignored).
Sourcepub fn drop<I, T>(self, columns: I) -> LazyFrame
pub fn drop<I, T>(self, columns: I) -> LazyFrame
Removes columns from the DataFrame. Note that it’s better to only select the columns you need and let the projection pushdown optimize away the unneeded columns.
Any given columns that are not in the schema will give a PolarsError::ColumnNotFound
error while materializing the LazyFrame
.
Sourcepub fn drop_no_validate<I, T>(self, columns: I) -> LazyFrame
pub fn drop_no_validate<I, T>(self, columns: I) -> LazyFrame
Removes columns from the DataFrame. Note that it’s better to only select the columns you need and let the projection pushdown optimize away the unneeded columns.
If a column name does not exist in the schema, it will quietly be ignored.
Sourcepub fn shift<E>(self, n: E) -> LazyFrame
pub fn shift<E>(self, n: E) -> LazyFrame
Shift the values by a given period and fill the parts that will be empty due to this operation
with Nones
.
See the method on Series for more info on the shift
operation.
Sourcepub fn shift_and_fill<E, IE>(self, n: E, fill_value: IE) -> LazyFrame
pub fn shift_and_fill<E, IE>(self, n: E, fill_value: IE) -> LazyFrame
Shift the values by a given period and fill the parts that will be empty due to this operation
with the result of the fill_value
expression.
See the method on Series for more info on the shift
operation.
Sourcepub fn fill_null<E>(self, fill_value: E) -> LazyFrame
pub fn fill_null<E>(self, fill_value: E) -> LazyFrame
Fill None values in the DataFrame with an expression.
Sourcepub fn fill_nan<E>(self, fill_value: E) -> LazyFrame
pub fn fill_nan<E>(self, fill_value: E) -> LazyFrame
Fill NaN values in the DataFrame with an expression.
Sourcepub fn cache(self) -> LazyFrame
pub fn cache(self) -> LazyFrame
Caches the result into a new LazyFrame.
This should be used to prevent computations running multiple times.
Sourcepub fn cast(
self,
dtypes: HashMap<&str, DataType, RandomState>,
strict: bool,
) -> LazyFrame
pub fn cast( self, dtypes: HashMap<&str, DataType, RandomState>, strict: bool, ) -> LazyFrame
Cast named frame columns, resulting in a new LazyFrame with updated dtypes
Sourcepub fn cast_all(self, dtype: DataType, strict: bool) -> LazyFrame
pub fn cast_all(self, dtype: DataType, strict: bool) -> LazyFrame
Cast all frame columns to the given dtype, resulting in a new LazyFrame
Sourcepub fn fetch(self, n_rows: usize) -> Result<DataFrame, PolarsError>
pub fn fetch(self, n_rows: usize) -> Result<DataFrame, PolarsError>
Fetch is like a collect operation, but it overwrites the number of rows read by every scan operation. This is a utility that helps debug a query on a smaller number of rows.
Note that the fetch does not guarantee the final number of rows in the DataFrame. Filter, join operations and a lower number of rows available in the scanned file influence the final number of rows.
pub fn optimize( self, lp_arena: &mut Arena<IR>, expr_arena: &mut Arena<AExpr>, ) -> Result<Node, PolarsError>
pub fn to_alp_optimized(self) -> Result<IRPlan, PolarsError>
pub fn to_alp(self) -> Result<IRPlan, PolarsError>
pub fn _collect_post_opt<P>(self, post_opt: P) -> Result<DataFrame, PolarsError>
Sourcepub fn collect(self) -> Result<DataFrame, PolarsError>
pub fn collect(self) -> Result<DataFrame, PolarsError>
Execute all the lazy operations and collect them into a DataFrame
.
The query is optimized prior to execution.
§Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;
fn example(df: DataFrame) -> PolarsResult<DataFrame> {
df.lazy()
.group_by([col("foo")])
.agg([col("bar").sum(), col("ham").mean().alias("avg_ham")])
.collect()
}
Sourcepub fn profile(self) -> Result<(DataFrame, DataFrame), PolarsError>
pub fn profile(self) -> Result<(DataFrame, DataFrame), PolarsError>
Profile a LazyFrame.
This will run the query and return a tuple containing the materialized DataFrame and a DataFrame that contains profiling information of each node that is executed.
The units of the timings are microseconds.
Sourcepub fn sink_parquet(
self,
path: &dyn AsRef<Path>,
options: ParquetWriteOptions,
cloud_options: Option<CloudOptions>,
) -> Result<(), PolarsError>
Available on crate feature parquet
only.
pub fn sink_parquet( self, path: &dyn AsRef<Path>, options: ParquetWriteOptions, cloud_options: Option<CloudOptions>, ) -> Result<(), PolarsError>
parquet
only.Stream a query result into a parquet file. This is useful if the final result doesn’t fit into memory. This methods will return an error if the query cannot be completely done in a streaming fashion.
Sourcepub fn sink_ipc(
self,
path: impl AsRef<Path>,
options: IpcWriterOptions,
cloud_options: Option<CloudOptions>,
) -> Result<(), PolarsError>
Available on crate feature ipc
only.
pub fn sink_ipc( self, path: impl AsRef<Path>, options: IpcWriterOptions, cloud_options: Option<CloudOptions>, ) -> Result<(), PolarsError>
ipc
only.Stream a query result into an ipc/arrow file. This is useful if the final result doesn’t fit into memory. This methods will return an error if the query cannot be completely done in a streaming fashion.
Sourcepub fn sink_csv(
self,
path: impl AsRef<Path>,
options: CsvWriterOptions,
cloud_options: Option<CloudOptions>,
) -> Result<(), PolarsError>
Available on crate feature csv
only.
pub fn sink_csv( self, path: impl AsRef<Path>, options: CsvWriterOptions, cloud_options: Option<CloudOptions>, ) -> Result<(), PolarsError>
csv
only.Stream a query result into an csv file. This is useful if the final result doesn’t fit into memory. This methods will return an error if the query cannot be completely done in a streaming fashion.
Sourcepub fn sink_json(
self,
path: impl AsRef<Path>,
options: JsonWriterOptions,
cloud_options: Option<CloudOptions>,
) -> Result<(), PolarsError>
Available on crate feature json
only.
pub fn sink_json( self, path: impl AsRef<Path>, options: JsonWriterOptions, cloud_options: Option<CloudOptions>, ) -> Result<(), PolarsError>
json
only.Stream a query result into a JSON file. This is useful if the final result doesn’t fit into memory. This methods will return an error if the query cannot be completely done in a streaming fashion.
Sourcepub fn filter(self, predicate: Expr) -> LazyFrame
pub fn filter(self, predicate: Expr) -> LazyFrame
Filter by some predicate expression.
The expression must yield boolean values.
§Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;
fn example(df: DataFrame) -> LazyFrame {
df.lazy()
.filter(col("sepal_width").is_not_null())
.select([col("sepal_width"), col("sepal_length")])
}
Sourcepub fn select<E>(self, exprs: E) -> LazyFrame
pub fn select<E>(self, exprs: E) -> LazyFrame
Select (and optionally rename, with alias
) columns from the query.
Columns can be selected with col
;
If you want to select all columns use col(PlSmallStr::from_static("*"))
.
§Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;
/// This function selects column "foo" and column "bar".
/// Column "bar" is renamed to "ham".
fn example(df: DataFrame) -> LazyFrame {
df.lazy()
.select([col("foo"),
col("bar").alias("ham")])
}
/// This function selects all columns except "foo"
fn exclude_a_column(df: DataFrame) -> LazyFrame {
df.lazy()
.select([col(PlSmallStr::from_static("*")).exclude(["foo"])])
}
pub fn select_seq<E>(self, exprs: E) -> LazyFrame
Sourcepub fn group_by<E, IE>(self, by: E) -> LazyGroupBy
pub fn group_by<E, IE>(self, by: E) -> LazyGroupBy
Performs a “group-by” on a LazyFrame
, producing a LazyGroupBy
, which can subsequently be aggregated.
Takes a list of expressions to group on.
§Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;
use arrow::legacy::prelude::QuantileMethod;
fn example(df: DataFrame) -> LazyFrame {
df.lazy()
.group_by([col("date")])
.agg([
col("rain").min().alias("min_rain"),
col("rain").sum().alias("sum_rain"),
col("rain").quantile(lit(0.5), QuantileMethod::Nearest).alias("median_rain"),
])
}
Sourcepub fn rolling<E>(
self,
index_column: Expr,
group_by: E,
options: RollingGroupOptions,
) -> LazyGroupBy
Available on crate feature dynamic_group_by
only.
pub fn rolling<E>( self, index_column: Expr, group_by: E, options: RollingGroupOptions, ) -> LazyGroupBy
dynamic_group_by
only.Create rolling groups based on a time column.
Also works for index values of type UInt32, UInt64, Int32, or Int64.
Different from a group_by_dynamic
, the windows are now determined by the
individual values and are not of constant intervals. For constant intervals use
group_by_dynamic
Sourcepub fn group_by_dynamic<E>(
self,
index_column: Expr,
group_by: E,
options: DynamicGroupOptions,
) -> LazyGroupBy
Available on crate feature dynamic_group_by
only.
pub fn group_by_dynamic<E>( self, index_column: Expr, group_by: E, options: DynamicGroupOptions, ) -> LazyGroupBy
dynamic_group_by
only.Group based on a time value (or index value of type Int32, Int64).
Time windows are calculated and rows are assigned to windows. Different from a normal group_by is that a row can be member of multiple groups. The time/index window could be seen as a rolling window, with a window size determined by dates/times/values instead of slots in the DataFrame.
A window is defined by:
- every: interval of the window
- period: length of the window
- offset: offset of the window
The group_by
argument should be empty []
if you don’t want to combine this
with a ordinary group_by on these keys.
Sourcepub fn group_by_stable<E, IE>(self, by: E) -> LazyGroupBy
pub fn group_by_stable<E, IE>(self, by: E) -> LazyGroupBy
Similar to group_by
, but order of the DataFrame is maintained.
Sourcepub fn anti_join<E>(
self,
other: LazyFrame,
left_on: E,
right_on: E,
) -> LazyFrame
Available on crate feature semi_anti_join
only.
pub fn anti_join<E>( self, other: LazyFrame, left_on: E, right_on: E, ) -> LazyFrame
semi_anti_join
only.Left anti join this query with another lazy query.
Matches on the values of the expressions left_on
and right_on
. For more
flexible join logic, see join
or
join_builder
.
§Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;
fn anti_join_dataframes(ldf: LazyFrame, other: LazyFrame) -> LazyFrame {
ldf
.anti_join(other, col("foo"), col("bar").cast(DataType::String))
}
Sourcepub fn cross_join(
self,
other: LazyFrame,
suffix: Option<PlSmallStr>,
) -> LazyFrame
Available on crate feature cross_join
only.
pub fn cross_join( self, other: LazyFrame, suffix: Option<PlSmallStr>, ) -> LazyFrame
cross_join
only.Creates the Cartesian product from both frames, preserving the order of the left keys.
Sourcepub fn left_join<E>(
self,
other: LazyFrame,
left_on: E,
right_on: E,
) -> LazyFrame
pub fn left_join<E>( self, other: LazyFrame, left_on: E, right_on: E, ) -> LazyFrame
Left outer join this query with another lazy query.
Matches on the values of the expressions left_on
and right_on
. For more
flexible join logic, see join
or
join_builder
.
§Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;
fn left_join_dataframes(ldf: LazyFrame, other: LazyFrame) -> LazyFrame {
ldf
.left_join(other, col("foo"), col("bar"))
}
Sourcepub fn inner_join<E>(
self,
other: LazyFrame,
left_on: E,
right_on: E,
) -> LazyFrame
pub fn inner_join<E>( self, other: LazyFrame, left_on: E, right_on: E, ) -> LazyFrame
Inner join this query with another lazy query.
Matches on the values of the expressions left_on
and right_on
. For more
flexible join logic, see join
or
join_builder
.
§Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;
fn inner_join_dataframes(ldf: LazyFrame, other: LazyFrame) -> LazyFrame {
ldf
.inner_join(other, col("foo"), col("bar").cast(DataType::String))
}
Sourcepub fn full_join<E>(
self,
other: LazyFrame,
left_on: E,
right_on: E,
) -> LazyFrame
pub fn full_join<E>( self, other: LazyFrame, left_on: E, right_on: E, ) -> LazyFrame
Full outer join this query with another lazy query.
Matches on the values of the expressions left_on
and right_on
. For more
flexible join logic, see join
or
join_builder
.
§Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;
fn full_join_dataframes(ldf: LazyFrame, other: LazyFrame) -> LazyFrame {
ldf
.full_join(other, col("foo"), col("bar"))
}
Sourcepub fn semi_join<E>(
self,
other: LazyFrame,
left_on: E,
right_on: E,
) -> LazyFrame
Available on crate feature semi_anti_join
only.
pub fn semi_join<E>( self, other: LazyFrame, left_on: E, right_on: E, ) -> LazyFrame
semi_anti_join
only.Left semi join this query with another lazy query.
Matches on the values of the expressions left_on
and right_on
. For more
flexible join logic, see join
or
join_builder
.
§Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;
fn semi_join_dataframes(ldf: LazyFrame, other: LazyFrame) -> LazyFrame {
ldf
.semi_join(other, col("foo"), col("bar").cast(DataType::String))
}
Sourcepub fn join<E>(
self,
other: LazyFrame,
left_on: E,
right_on: E,
args: JoinArgs,
) -> LazyFrame
pub fn join<E>( self, other: LazyFrame, left_on: E, right_on: E, args: JoinArgs, ) -> LazyFrame
Generic function to join two LazyFrames.
join
can join on multiple columns, given as two list of expressions, and with a
JoinType
specified by how
. Non-joined column names in the right DataFrame
that already exist in this DataFrame are suffixed with "_right"
. For control
over how columns are renamed and parallelization options, use
join_builder
.
Any provided args.slice
parameter is not considered, but set by the internal optimizer.
§Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;
fn example(ldf: LazyFrame, other: LazyFrame) -> LazyFrame {
ldf
.join(other, [col("foo"), col("bar")], [col("foo"), col("bar")], JoinArgs::new(JoinType::Inner))
}
Sourcepub fn join_builder(self) -> JoinBuilder
pub fn join_builder(self) -> JoinBuilder
Consume self
and return a JoinBuilder
to customize a join on this LazyFrame.
After the JoinBuilder
has been created and set up, calling
finish()
on it will give back the LazyFrame
representing the join
operation.
Sourcepub fn with_column(self, expr: Expr) -> LazyFrame
pub fn with_column(self, expr: Expr) -> LazyFrame
Add or replace a column, given as an expression, to a DataFrame.
§Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;
fn add_column(df: DataFrame) -> LazyFrame {
df.lazy()
.with_column(
when(col("sepal_length").lt(lit(5.0)))
.then(lit(10))
.otherwise(lit(1))
.alias("new_column_name"),
)
}
Sourcepub fn with_columns<E>(self, exprs: E) -> LazyFrame
pub fn with_columns<E>(self, exprs: E) -> LazyFrame
Add or replace multiple columns, given as expressions, to a DataFrame.
§Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;
fn add_columns(df: DataFrame) -> LazyFrame {
df.lazy()
.with_columns(
vec![lit(10).alias("foo"), lit(100).alias("bar")]
)
}
Sourcepub fn with_columns_seq<E>(self, exprs: E) -> LazyFrame
pub fn with_columns_seq<E>(self, exprs: E) -> LazyFrame
Add or replace multiple columns to a DataFrame, but evaluate them sequentially.
pub fn with_context<C>(self, contexts: C) -> LazyFrame
Sourcepub fn max(self) -> LazyFrame
pub fn max(self) -> LazyFrame
Aggregate all the columns as their maximum values.
Aggregated columns will have the same names as the original columns.
Sourcepub fn min(self) -> LazyFrame
pub fn min(self) -> LazyFrame
Aggregate all the columns as their minimum values.
Aggregated columns will have the same names as the original columns.
Sourcepub fn sum(self) -> LazyFrame
pub fn sum(self) -> LazyFrame
Aggregate all the columns as their sum values.
Aggregated columns will have the same names as the original columns.
- Boolean columns will sum to a
u32
containing the number oftrue
s. - For integer columns, the ordinary checks for overflow are performed:
if running in
debug
mode, overflows will panic, whereas inrelease
mode overflows will silently wrap. - String columns will sum to None.
Sourcepub fn mean(self) -> LazyFrame
pub fn mean(self) -> LazyFrame
Aggregate all the columns as their mean values.
- Boolean and integer columns are converted to
f64
before computing the mean. - String columns will have a mean of None.
Sourcepub fn median(self) -> LazyFrame
pub fn median(self) -> LazyFrame
Aggregate all the columns as their median values.
- Boolean and integer results are converted to
f64
. However, they are still susceptible to overflow before this conversion occurs. - String columns will sum to None.
Sourcepub fn quantile(self, quantile: Expr, method: QuantileMethod) -> LazyFrame
pub fn quantile(self, quantile: Expr, method: QuantileMethod) -> LazyFrame
Aggregate all the columns as their quantile values.
Sourcepub fn std(self, ddof: u8) -> LazyFrame
pub fn std(self, ddof: u8) -> LazyFrame
Aggregate all the columns as their standard deviation values.
ddof
is the “Delta Degrees of Freedom”; N - ddof
will be the denominator when
computing the variance, where N
is the number of rows.
In standard statistical practice,
ddof=1
provides an unbiased estimator of the variance of a hypothetical infinite population.ddof=0
provides a maximum likelihood estimate of the variance for normally distributed variables. The standard deviation computed in this function is the square root of the estimated variance, so even withddof=1
, it will not be an unbiased estimate of the standard deviation per se.
Source: Numpy
Sourcepub fn var(self, ddof: u8) -> LazyFrame
pub fn var(self, ddof: u8) -> LazyFrame
Aggregate all the columns as their variance values.
ddof
is the “Delta Degrees of Freedom”; N - ddof
will be the denominator when
computing the variance, where N
is the number of rows.
In standard statistical practice,
ddof=1
provides an unbiased estimator of the variance of a hypothetical infinite population.ddof=0
provides a maximum likelihood estimate of the variance for normally distributed variables.
Source: Numpy
Sourcepub fn explode<E, IE>(self, columns: E) -> LazyFrame
pub fn explode<E, IE>(self, columns: E) -> LazyFrame
Apply explode operation. See eager explode.
Sourcepub fn null_count(self) -> LazyFrame
pub fn null_count(self) -> LazyFrame
Aggregate all the columns as the sum of their null value count.
Sourcepub fn unique_stable(
self,
subset: Option<Vec<PlSmallStr>>,
keep_strategy: UniqueKeepStrategy,
) -> LazyFrame
pub fn unique_stable( self, subset: Option<Vec<PlSmallStr>>, keep_strategy: UniqueKeepStrategy, ) -> LazyFrame
Drop non-unique rows and maintain the order of kept rows.
subset
is an optional Vec
of column names to consider for uniqueness; if
None
, all columns are considered.
pub fn unique_stable_generic<E, IE>( self, subset: Option<E>, keep_strategy: UniqueKeepStrategy, ) -> LazyFrame
Sourcepub fn unique(
self,
subset: Option<Vec<String>>,
keep_strategy: UniqueKeepStrategy,
) -> LazyFrame
pub fn unique( self, subset: Option<Vec<String>>, keep_strategy: UniqueKeepStrategy, ) -> LazyFrame
Drop non-unique rows without maintaining the order of kept rows.
The order of the kept rows may change; to maintain the original row order, use
unique_stable
.
subset
is an optional Vec
of column names to consider for uniqueness; if None,
all columns are considered.
pub fn unique_generic<E, IE>( self, subset: Option<E>, keep_strategy: UniqueKeepStrategy, ) -> LazyFrame
Sourcepub fn drop_nans(self, subset: Option<Vec<Expr>>) -> LazyFrame
pub fn drop_nans(self, subset: Option<Vec<Expr>>) -> LazyFrame
Drop rows containing one or more NaN values.
subset
is an optional Vec
of column names to consider for NaNs; if None, all
floating point columns are considered.
Sourcepub fn drop_nulls(self, subset: Option<Vec<Expr>>) -> LazyFrame
pub fn drop_nulls(self, subset: Option<Vec<Expr>>) -> LazyFrame
Drop rows containing one or more None values.
subset
is an optional Vec
of column names to consider for nulls; if None, all
columns are considered.
Sourcepub fn slice(self, offset: i64, len: u32) -> LazyFrame
pub fn slice(self, offset: i64, len: u32) -> LazyFrame
Slice the DataFrame using an offset (starting row) and a length.
If offset
is negative, it is counted from the end of the DataFrame. For
instance, lf.slice(-5, 3)
gets three rows, starting at the row fifth from the
end.
If offset
and len
are such that the slice extends beyond the end of the
DataFrame, the portion between offset
and the end will be returned. In this
case, the number of rows in the returned DataFrame will be less than len
.
Sourcepub fn tail(self, n: u32) -> LazyFrame
pub fn tail(self, n: u32) -> LazyFrame
Get the last n
rows.
Equivalent to self.slice(-(n as i64), n)
.
Sourcepub fn limit(self, n: u32) -> LazyFrame
pub fn limit(self, n: u32) -> LazyFrame
Limit the DataFrame to the first n
rows.
Note if you don’t want the rows to be scanned, use fetch
.
Sourcepub fn map<F>(
self,
function: F,
optimizations: OptFlags,
schema: Option<Arc<dyn UdfSchema>>,
name: Option<&'static str>,
) -> LazyFrame
pub fn map<F>( self, function: F, optimizations: OptFlags, schema: Option<Arc<dyn UdfSchema>>, name: Option<&'static str>, ) -> LazyFrame
Apply a function/closure once the logical plan get executed.
The function has access to the whole materialized DataFrame at the time it is called.
To apply specific functions to specific columns, use Expr::map
in conjunction
with LazyFrame::with_column
or with_columns
.
§Warning
This can blow up in your face if the schema is changed due to the operation. The optimizer relies on a correct schema.
You can toggle certain optimizations off.
Sourcepub fn with_row_index<S>(self, name: S, offset: Option<u32>) -> LazyFramewhere
S: Into<PlSmallStr>,
pub fn with_row_index<S>(self, name: S, offset: Option<u32>) -> LazyFramewhere
S: Into<PlSmallStr>,
Add a new column at index 0 that counts the rows.
name
is the name of the new column. offset
is where to start counting from; if
None
, it is set to 0
.
§Warning
This can have a negative effect on query performance. This may for instance block predicate pushdown optimization.
Source§impl LazyFrame
impl LazyFrame
pub fn anonymous_scan( function: Arc<dyn AnonymousScan>, args: ScanArgsAnonymous, ) -> Result<LazyFrame, PolarsError>
Source§impl LazyFrame
impl LazyFrame
Sourcepub fn scan_ipc(
path: impl AsRef<Path>,
args: ScanArgsIpc,
) -> Result<LazyFrame, PolarsError>
pub fn scan_ipc( path: impl AsRef<Path>, args: ScanArgsIpc, ) -> Result<LazyFrame, PolarsError>
Create a LazyFrame directly from a ipc scan.
pub fn scan_ipc_files( paths: Arc<[PathBuf]>, args: ScanArgsIpc, ) -> Result<LazyFrame, PolarsError>
pub fn scan_ipc_sources( sources: ScanSources, args: ScanArgsIpc, ) -> Result<LazyFrame, PolarsError>
Source§impl LazyFrame
impl LazyFrame
Sourcepub fn scan_parquet(
path: impl AsRef<Path>,
args: ScanArgsParquet,
) -> Result<LazyFrame, PolarsError>
pub fn scan_parquet( path: impl AsRef<Path>, args: ScanArgsParquet, ) -> Result<LazyFrame, PolarsError>
Create a LazyFrame directly from a parquet scan.
Sourcepub fn scan_parquet_sources(
sources: ScanSources,
args: ScanArgsParquet,
) -> Result<LazyFrame, PolarsError>
pub fn scan_parquet_sources( sources: ScanSources, args: ScanArgsParquet, ) -> Result<LazyFrame, PolarsError>
Create a LazyFrame directly from a parquet scan.
Sourcepub fn scan_parquet_files(
paths: Arc<[PathBuf]>,
args: ScanArgsParquet,
) -> Result<LazyFrame, PolarsError>
pub fn scan_parquet_files( paths: Arc<[PathBuf]>, args: ScanArgsParquet, ) -> Result<LazyFrame, PolarsError>
Create a LazyFrame directly from a parquet scan.
Trait Implementations§
Source§impl From<LazyGroupBy> for LazyFrame
impl From<LazyGroupBy> for LazyFrame
Source§fn from(lgb: LazyGroupBy) -> LazyFrame
fn from(lgb: LazyGroupBy) -> LazyFrame
Auto Trait Implementations§
impl !Freeze for LazyFrame
impl !RefUnwindSafe for LazyFrame
impl Send for LazyFrame
impl Sync for LazyFrame
impl Unpin for LazyFrame
impl !UnwindSafe for LazyFrame
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
§impl<T> Instrument for T
impl<T> Instrument for T
§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self> ⓘ
fn into_either(self, into_left: bool) -> Either<Self, Self> ⓘ
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self> ⓘ
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self> ⓘ
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more