A thread-safe reference-counting pointer. ‘Arc’ stands for ‘Atomically
Reference Counted’.
Represents Arrow’s metadata of a “column”.
An ordered sequence of
Field
s with associated [
Metadata
].
A valid Brotli compression level.
Specialized expressions for Categorical dtypes.
Utility struct for the when-then-otherwise
expression.
Utility struct for the when-then-otherwise
expression.
ChunkedArray
Create a new DataFrame by reading a csv file.
Write a DataFrame to csv.
Options for writing CSV files.
A contiguous growable collection of Series
that have the same length.
Arguments used by
datetime
in order to produce an
Expr
of Datetime
Arguments used by
duration
in order to produce an
Expr
of
Duration
Specialized expressions for modifying the name of existing expressions.
Characterizes the name and the
DataType
of a column.
Metadata for a Parquet file.
Returned by a group_by operation on a DataFrame. This struct supports
several aggregations.
Indexes of the groups, the first index is stored separately.
this make sorting fast.
A valid Gzip compression level.
Read Arrows IPC format into a DataFrame
Read Arrows Stream IPC format into a DataFrame
Write a DataFrame to Arrow’s Streaming IPC format
Write a DataFrame to Arrow’s IPC format
Reads JSON in one of the formats in
JsonFormat
into a DataFrame.
Writes a DataFrame to JSON.
Lazy abstraction over an eager
DataFrame
.
It really is an abstraction over a logical plan. The methods of this struct will incrementally
modify a logical plan until output is requested (via
collect
).
Utility struct for lazy group_by operation.
Maps a logical type to a chunked array implementation of the physical type.
This saves a lot of compiler bloat and allows us to reuse functionality.
Arguments for [DataFrame::melt]
function
Just a wrapper structure. Useful for certain impl specializations
This is for instance use to implement
impl<T> FromIterator<T::Native> for NoNull<ChunkedArray<T>>
as Option<T::Native>
was already implemented:
impl<T> FromIterator<Option<T::Native>> for ChunkedArray<T>
The literal Null
State of the allowed optimizations
Read Apache parquet format into a DataFrame.
Arrow-deserialized parquet Statistics of a file
Write a DataFrame to Parquet format.
A map from field/column name (
String
) to the type of that field/column (
DataType
)
Options to serialize logical types to CSV.
Series
Sort options for multi-series sorting.
Options for single series sorting.
Wrapper type that has special equality properties
depending on the inner type specialization
The statistics to write
Enable the global string cache as long as the object is alive (
RAII).
A
StructArray
is a nested [
Array
] with an optional validity representing
multiple [
Array
] with the same number of rows.
This is logical type
StructChunked
that
dispatches most logic to the
fields
implementations
Specialized expressions for Struct dtypes.
Utility struct for the when-then-otherwise
expression.
Represents a user-defined function
Utility struct for the when-then-otherwise
expression.
Represents a window in time
A valid Zstandard compression level.
Argmin/ Argmax
Aggregation operations.
Aggregations that return
Series
of unit length. Those can be used in broadcasting operations.
Fastest way to do elementwise operations on a
ChunkedArray<T>
when the operation is cheaper than
branching due to null checking.
Apply kernels on the arrow array chunks in a ChunkedArray.
Cast ChunkedArray<T>
to ChunkedArray<N>
Create a new ChunkedArray filled with values at that index.
Explode/flatten a List or String Series
Replace None values with a value
Filter values by a boolean mask.
Fill a ChunkedArray with one value.
Quantile and median aggregation.
This differs from ChunkWindowCustom and ChunkWindow
by not using a fold aggregator, but reusing a Series
wrapper and calling Series
aggregators.
This likely is a bit slower than ChunkWindow
Create a ChunkedArray
with new values by index or by boolean mask.
Note that these operations clone data. This is however the only way we can modify at mask or
index level as the underlying Arrow arrays are immutable.
Sort operations on ChunkedArray
.
Get unique values in a ChunkedArray
Variance and standard deviation aggregation.
This trait exists to be unify the API of polars Schema and arrows Schema
Used to create the tuples for a group_by operation.
Mask the first unique values as true
Mask the last unique values as true
Reads
LazyFrame from a filesystem or a cloud storage.
Supports glob patterns.
Safety
Values need to implement this so that they can be stored into a Series and DataFrame
A wrapper trait for any binary closure Fn(Series, Series) -> PolarsResult<Series>
A wrapper trait for any closure Fn(Vec<Series>) -> PolarsResult<Series>
Utility trait to slice concrete arrow arrays whilst keeping their
concrete type. E.g. don’t return Box<dyn Array>
.
Convert numerical values to their absolute value.
Selects all columns. Shorthand for col("*")
.
Create a new column with the bitwise-and of the elements in each row.
Create a new column with the bitwise-or of the elements in each row.
Like
map_binary
, but used in a group_by-aggregation context.
Apply a function/closure over the groups of multiple columns. This should only be used in a group_by aggregation.
Generate a range of integers.
Find the indexes that would sort these series in order of appearance.
That means that the first Series
will be used to determine the ordering
until duplicates are found. Once duplicates are found, the next Series
will
be used and so on.
Get the indices where condition
evaluates true
.
Find the mean of all the values in the column named
name
. Alias for
mean
.
Compute op(l, r)
(or equivalently l op r
). l
and r
must have types compatible with the Operator.
Casts the column given by Expr
to a different type.
Checks if the projected columns are equal
Checks if the projected columns are equal
Set values outside the given boundaries to the boundary value.
Set values above the given maximum to the maximum value.
Set values below the given minimum to the minimum value.
Folds the expressions from left to right keeping the first non-null values.
Create a Column Expression based on a column name.
Select multiple columns by name.
Concat lists entries.
Horizontally concat string columns in linear time
Cast null arrays to inner type and ensure that all offsets remain correct
Read the number of rows without parsing columns
useful for count(*) queries
Accumulate over multiple columns horizontally / row wise.
Get an array with the cumulative max computed at every element.
Get an array with the cumulative min computed at every element.
Get an array with the cumulative product computed at every element.
Accumulate over multiple columns horizontally / row wise.
Get an array with the cumulative sum computed at every element
Create a column of date ranges from a start
and stop
expression.
Construct a column of
Datetime
from the provided
DatetimeArgs
.
Create a datetime range from a start
and stop
expression.
Create a column of datetime ranges from a start
and stop
expression.
Deserializes the statistics in the column chunks from a single
row_group
into
Statistics
associated from
field
’s name.
Select multiple columns by dtype.
Select multiple columns by dtype.
First column in a DataFrame.
Accumulate over multiple columns horizontally / row wise.
Format the results of an array of expressions using a format string
Compute remaining_rows_to_read
to be taken per file up front, so we can actually read
concurrently/parallel
Different from group_by_windows
, where define window buckets and search which values fit that
pre-defined bucket, this function defines every window based on the:
- timestamp (lower bound)
- timestamp + period (upper bound)
where timestamps are the individual values in the array time
Window boundaries are created based on the given Window
, which is defined by:
Horizontally concatenate all strings.
If ambiguous
is length-1 and not equal to “null”, we can take a slightly faster path.
Select multiple columns by index.
Infer the schema of a CSV file by reading through the first n rows of the file,
with max_read_rows
controlling the maximum number of rows to read.
Generate a range of integers.
Generate a range of integers for each row of the input columns.
Check if the path is a cloud url.
check if csv file is compressed
A column which is false
wherever expr
is null, true
elsewhere.
A column which is true
wherever expr
is null, false
elsewhere.
May give false negatives because it ignores the null values.
Last column in a DataFrame.
Return the number of rows in the context.
Create a Literal Expression from L
. A literal expression behaves like a column that contains a single distinct
value.
Apply a closure on the two columns that are evaluated from
Expr
a and
Expr
b.
Apply a function/closure over multiple columns once the logical plan get executed.
Apply a function/closure over multiple columns once the logical plan get executed.
Find the maximum of all the values in the column named name
. Shorthand for col(name).max()
.
Find the mean of all the values in the column named name
. Shorthand for col(name).mean()
.
Find the median of all the values in the column named name
. Shorthand for col(name).median()
.
Find the minimum of all the values in the column named name
. Shorthand for col(name).min()
.
Negates a boolean column.
Nth column in a DataFrame.
Find a specific quantile of all the values in the column named name
.
Create a column of length n
containing n
copies of the literal value
. Generally you won’t need this function,
as lit(value)
already represents a column containing only value
whose length is automatically set to the correct
number of rows.
Sum all the values in the column named name
. Shorthand for col(name).sum()
.
Create a column of time ranges from a start
and stop
expression.
Start a when-then-otherwise
expression.