Getting started
This chapter is here to help you get started with Polars. It covers all the fundamental features and functionalities of the library, making it easy for new users to familiarise themselves with the basics from initial installation and setup to core functionalities. If you're already an advanced user or familiar with Dataframes, feel free to skip ahead to the next chapter about installation options.
Installing Polars
pip install polars
cargo add polars -F lazy
# Or Cargo.toml
[dependencies]
polars = { version = "x", features = ["lazy", ...]}
Reading & writing
Polars supports reading and writing for common file formats (e.g. csv, json, parquet), cloud storage (S3, Azure Blob, BigQuery) and databases (e.g. postgres, mysql). Below we show the concept of reading and writing to disk.
import polars as pl
from datetime import datetime
df = pl.DataFrame(
{
"integer": [1, 2, 3],
"date": [
datetime(2025, 1, 1),
datetime(2025, 1, 2),
datetime(2025, 1, 3),
],
"float": [4.0, 5.0, 6.0],
"string": ["a", "b", "c"],
}
)
print(df)
use std::fs::File;
use chrono::prelude::*;
use polars::prelude::*;
let mut df: DataFrame = df!(
"integer" => &[1, 2, 3],
"date" => &[
NaiveDate::from_ymd_opt(2025, 1, 1).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2025, 1, 2).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2025, 1, 3).unwrap().and_hms_opt(0, 0, 0).unwrap(),
],
"float" => &[4.0, 5.0, 6.0],
"string" => &["a", "b", "c"],
)
.unwrap();
println!("{}", df);
shape: (3, 4)
┌─────────┬─────────────────────┬───────┬────────┐
│ integer ┆ date ┆ float ┆ string │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ datetime[μs] ┆ f64 ┆ str │
╞═════════╪═════════════════════╪═══════╪════════╡
│ 1 ┆ 2025-01-01 00:00:00 ┆ 4.0 ┆ a │
│ 2 ┆ 2025-01-02 00:00:00 ┆ 5.0 ┆ b │
│ 3 ┆ 2025-01-03 00:00:00 ┆ 6.0 ┆ c │
└─────────┴─────────────────────┴───────┴────────┘
In the example below we write the DataFrame to a csv file called output.csv
. After that, we read it back using read_csv
and then print
the result for inspection.
df.write_csv("docs/data/output.csv")
df_csv = pl.read_csv("docs/data/output.csv")
print(df_csv)
CsvReader
· CsvWriter
· Available on feature csv
let mut file = File::create("docs/data/output.csv").expect("could not create file");
CsvWriter::new(&mut file)
.include_header(true)
.with_separator(b',')
.finish(&mut df)?;
let df_csv = CsvReadOptions::default()
.with_infer_schema_length(None)
.with_has_header(true)
.try_into_reader_with_file_path(Some("docs/data/output.csv".into()))?
.finish()?;
println!("{}", df_csv);
shape: (3, 4)
┌─────────┬────────────────────────────┬───────┬────────┐
│ integer ┆ date ┆ float ┆ string │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ f64 ┆ str │
╞═════════╪════════════════════════════╪═══════╪════════╡
│ 1 ┆ 2025-01-01T00:00:00.000000 ┆ 4.0 ┆ a │
│ 2 ┆ 2025-01-02T00:00:00.000000 ┆ 5.0 ┆ b │
│ 3 ┆ 2025-01-03T00:00:00.000000 ┆ 6.0 ┆ c │
└─────────┴────────────────────────────┴───────┴────────┘
For more examples on the CSV file format and other data formats, start with the IO section of the user guide.
Expressions
Expressions
are the core strength of Polars. The expressions
offer a modular structure that allows you to combine simple concepts into complex queries. Below we cover the basic components that serve as building blocks (or in Polars terminology contexts) for all your queries:
select
filter
with_columns
group_by
To learn more about expressions and the context in which they operate, see the user guide sections: Contexts and Expressions.
Select
To select a column we need to do two things:
- Define the
DataFrame
we want the data from. - Select the data that we need.
In the example below you see that we select col('*')
. The asterisk stands for all columns.
shape: (5, 4)
┌─────┬──────────┬─────────────────────┬───────┐
│ a ┆ b ┆ c ┆ d │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ datetime[μs] ┆ f64 │
╞═════╪══════════╪═════════════════════╪═══════╡
│ 0 ┆ 0.418342 ┆ 2025-12-01 00:00:00 ┆ 1.0 │
│ 1 ┆ 0.178213 ┆ 2025-12-02 00:00:00 ┆ 2.0 │
│ 2 ┆ 0.08584 ┆ 2025-12-03 00:00:00 ┆ NaN │
│ 3 ┆ 0.811393 ┆ 2025-12-04 00:00:00 ┆ -42.0 │
│ 4 ┆ 0.089994 ┆ 2025-12-05 00:00:00 ┆ null │
└─────┴──────────┴─────────────────────┴───────┘
You can also specify the specific columns that you want to return. There are two ways to do this. The first option is to pass the column names, as seen below.
shape: (5, 2)
┌─────┬──────────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ f64 │
╞═════╪══════════╡
│ 0 ┆ 0.418342 │
│ 1 ┆ 0.178213 │
│ 2 ┆ 0.08584 │
│ 3 ┆ 0.811393 │
│ 4 ┆ 0.089994 │
└─────┴──────────┘
Follow these links to other parts of the user guide to learn more about basic operations or column selections.
Filter
The filter
option allows us to create a subset of the DataFrame
. We use the same DataFrame
as earlier and we filter between two specified dates.
df.filter(
pl.col("c").is_between(datetime(2025, 12, 2), datetime(2025, 12, 3)),
)
let start_date = NaiveDate::from_ymd_opt(2025, 12, 2)
.unwrap()
.and_hms_opt(0, 0, 0)
.unwrap();
let end_date = NaiveDate::from_ymd_opt(2025, 12, 3)
.unwrap()
.and_hms_opt(0, 0, 0)
.unwrap();
let out = df
.clone()
.lazy()
.filter(
col("c")
.gt_eq(lit(start_date))
.and(col("c").lt_eq(lit(end_date))),
)
.collect()?;
println!("{}", out);
shape: (2, 4)
┌─────┬──────────┬─────────────────────┬─────┐
│ a ┆ b ┆ c ┆ d │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ datetime[μs] ┆ f64 │
╞═════╪══════════╪═════════════════════╪═════╡
│ 1 ┆ 0.178213 ┆ 2025-12-02 00:00:00 ┆ 2.0 │
│ 2 ┆ 0.08584 ┆ 2025-12-03 00:00:00 ┆ NaN │
└─────┴──────────┴─────────────────────┴─────┘
With filter
you can also create more complex filters that include multiple columns.
shape: (3, 4)
┌─────┬──────────┬─────────────────────┬───────┐
│ a ┆ b ┆ c ┆ d │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ datetime[μs] ┆ f64 │
╞═════╪══════════╪═════════════════════╪═══════╡
│ 0 ┆ 0.418342 ┆ 2025-12-01 00:00:00 ┆ 1.0 │
│ 1 ┆ 0.178213 ┆ 2025-12-02 00:00:00 ┆ 2.0 │
│ 3 ┆ 0.811393 ┆ 2025-12-04 00:00:00 ┆ -42.0 │
└─────┴──────────┴─────────────────────┴───────┘
Add columns
with_columns
allows you to create new columns for your analyses. We create two new columns e
and b+42
. First we sum all values from column b
and store the results in column e
. After that we add 42
to the values of b
. Creating a new column b+42
to store these results.
df.with_columns(pl.col("b").sum().alias("e"), (pl.col("b") + 42).alias("b+42"))
let out = df
.clone()
.lazy()
.with_columns([
col("b").sum().alias("e"),
(col("b") + lit(42)).alias("b+42"),
])
.collect()?;
println!("{}", out);
shape: (5, 6)
┌─────┬──────────┬─────────────────────┬───────┬──────────┬───────────┐
│ a ┆ b ┆ c ┆ d ┆ e ┆ b+42 │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ datetime[μs] ┆ f64 ┆ f64 ┆ f64 │
╞═════╪══════════╪═════════════════════╪═══════╪══════════╪═══════════╡
│ 0 ┆ 0.418342 ┆ 2025-12-01 00:00:00 ┆ 1.0 ┆ 1.583783 ┆ 42.418342 │
│ 1 ┆ 0.178213 ┆ 2025-12-02 00:00:00 ┆ 2.0 ┆ 1.583783 ┆ 42.178213 │
│ 2 ┆ 0.08584 ┆ 2025-12-03 00:00:00 ┆ NaN ┆ 1.583783 ┆ 42.08584 │
│ 3 ┆ 0.811393 ┆ 2025-12-04 00:00:00 ┆ -42.0 ┆ 1.583783 ┆ 42.811393 │
│ 4 ┆ 0.089994 ┆ 2025-12-05 00:00:00 ┆ null ┆ 1.583783 ┆ 42.089994 │
└─────┴──────────┴─────────────────────┴───────┴──────────┴───────────┘
Group by
We will create a new DataFrame
for the Group by functionality. This new DataFrame
will include several 'groups' that we want to group by.
shape: (8, 2)
┌─────┬─────┐
│ x ┆ y │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪═════╡
│ 0 ┆ A │
│ 1 ┆ A │
│ 2 ┆ A │
│ 3 ┆ B │
│ 4 ┆ B │
│ 5 ┆ C │
│ 6 ┆ X │
│ 7 ┆ X │
└─────┴─────┘
shape: (4, 2)
┌─────┬─────┐
│ y ┆ len │
│ --- ┆ --- │
│ str ┆ u32 │
╞═════╪═════╡
│ A ┆ 3 │
│ B ┆ 2 │
│ C ┆ 1 │
│ X ┆ 2 │
└─────┴─────┘
shape: (4, 3)
┌─────┬───────┬─────┐
│ y ┆ count ┆ sum │
│ --- ┆ --- ┆ --- │
│ str ┆ u32 ┆ i64 │
╞═════╪═══════╪═════╡
│ A ┆ 3 ┆ 3 │
│ B ┆ 2 ┆ 7 │
│ C ┆ 1 ┆ 5 │
│ X ┆ 2 ┆ 13 │
└─────┴───────┴─────┘
Combination
Below are some examples on how to combine operations to create the DataFrame
you require.
df_x = df.with_columns((pl.col("a") * pl.col("b")).alias("a * b")).select(
pl.all().exclude(["c", "d"])
)
print(df_x)
let out = df
.clone()
.lazy()
.with_columns([(col("a") * col("b")).alias("a * b")])
.select([col("*").exclude(["c", "d"])])
.collect()?;
println!("{}", out);
shape: (5, 3)
┌─────┬──────────┬──────────┐
│ a ┆ b ┆ a * b │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ f64 │
╞═════╪══════════╪══════════╡
│ 0 ┆ 0.418342 ┆ 0.0 │
│ 1 ┆ 0.178213 ┆ 0.178213 │
│ 2 ┆ 0.08584 ┆ 0.17168 │
│ 3 ┆ 0.811393 ┆ 2.43418 │
│ 4 ┆ 0.089994 ┆ 0.359977 │
└─────┴──────────┴──────────┘
df_y = df.with_columns((pl.col("a") * pl.col("b")).alias("a * b")).select(
pl.all().exclude("d")
)
print(df_y)
let out = df
.clone()
.lazy()
.with_columns([(col("a") * col("b")).alias("a * b")])
.select([col("*").exclude(["d"])])
.collect()?;
println!("{}", out);
shape: (5, 4)
┌─────┬──────────┬─────────────────────┬──────────┐
│ a ┆ b ┆ c ┆ a * b │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ datetime[μs] ┆ f64 │
╞═════╪══════════╪═════════════════════╪══════════╡
│ 0 ┆ 0.418342 ┆ 2025-12-01 00:00:00 ┆ 0.0 │
│ 1 ┆ 0.178213 ┆ 2025-12-02 00:00:00 ┆ 0.178213 │
│ 2 ┆ 0.08584 ┆ 2025-12-03 00:00:00 ┆ 0.17168 │
│ 3 ┆ 0.811393 ┆ 2025-12-04 00:00:00 ┆ 2.43418 │
│ 4 ┆ 0.089994 ┆ 2025-12-05 00:00:00 ┆ 0.359977 │
└─────┴──────────┴─────────────────────┴──────────┘
Combining DataFrames
There are two ways DataFrame
s can be combined depending on the use case: join and concat.
Join
Polars supports all types of join (e.g. left, right, inner, outer). Let's have a closer look on how to join
two DataFrames
into a single DataFrame
. Our two DataFrames
both have an 'id'-like column: a
and x
. We can use those columns to join
the DataFrames
in this example.
df = pl.DataFrame(
{
"a": range(8),
"b": np.random.rand(8),
"d": [1.0, 2.0, float("nan"), float("nan"), 0.0, -5.0, -42.0, None],
}
)
df2 = pl.DataFrame(
{
"x": range(8),
"y": ["A", "A", "A", "B", "B", "C", "X", "X"],
}
)
joined = df.join(df2, left_on="a", right_on="x")
print(joined)
use rand::Rng;
let mut rng = rand::thread_rng();
let df: DataFrame = df!(
"a" => 0..8,
"b"=> (0..8).map(|_| rng.gen::<f64>()).collect::<Vec<f64>>(),
"d"=> [Some(1.0), Some(2.0), Some(f64::NAN), Some(f64::NAN), Some(0.0), Some(-5.0), Some(-42.), None]
)
.unwrap();
let df2: DataFrame = df!(
"x" => 0..8,
"y"=> &["A", "A", "A", "B", "B", "C", "X", "X"],
)
.unwrap();
let joined = df.join(&df2, ["a"], ["x"], JoinType::Left.into())?;
println!("{}", joined);
shape: (8, 4)
┌─────┬──────────┬───────┬─────┐
│ a ┆ b ┆ d ┆ y │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ f64 ┆ str │
╞═════╪══════════╪═══════╪═════╡
│ 0 ┆ 0.997102 ┆ 1.0 ┆ A │
│ 1 ┆ 0.265668 ┆ 2.0 ┆ A │
│ 2 ┆ 0.884949 ┆ NaN ┆ A │
│ 3 ┆ 0.330684 ┆ NaN ┆ B │
│ 4 ┆ 0.005853 ┆ 0.0 ┆ B │
│ 5 ┆ 0.57128 ┆ -5.0 ┆ C │
│ 6 ┆ 0.120339 ┆ -42.0 ┆ X │
│ 7 ┆ 0.918111 ┆ null ┆ X │
└─────┴──────────┴───────┴─────┘
To see more examples with other types of joins, see the Transformations section in the user guide.
Concat
We can also concatenate
two DataFrames
. Vertical concatenation will make the DataFrame
longer. Horizontal concatenation will make the DataFrame
wider. Below you can see the result of an horizontal concatenation of our two DataFrames
.
shape: (8, 5)
┌─────┬──────────┬───────┬─────┬─────┐
│ a ┆ b ┆ d ┆ x ┆ y │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ f64 ┆ i64 ┆ str │
╞═════╪══════════╪═══════╪═════╪═════╡
│ 0 ┆ 0.997102 ┆ 1.0 ┆ 0 ┆ A │
│ 1 ┆ 0.265668 ┆ 2.0 ┆ 1 ┆ A │
│ 2 ┆ 0.884949 ┆ NaN ┆ 2 ┆ A │
│ 3 ┆ 0.330684 ┆ NaN ┆ 3 ┆ B │
│ 4 ┆ 0.005853 ┆ 0.0 ┆ 4 ┆ B │
│ 5 ┆ 0.57128 ┆ -5.0 ┆ 5 ┆ C │
│ 6 ┆ 0.120339 ┆ -42.0 ┆ 6 ┆ X │
│ 7 ┆ 0.918111 ┆ null ┆ 7 ┆ X │
└─────┴──────────┴───────┴─────┴─────┘