Skip to content

Getting started

This chapter is here to help you get started with Polars. It covers all the fundamental features and functionalities of the library, making it easy for new users to familiarise themselves with the basics from initial installation and setup to core functionalities. If you're already an advanced user or familiar with Dataframes, feel free to skip ahead to the next chapter about installation options.

Installing Polars

pip install polars
cargo add polars -F lazy

# Or Cargo.toml
[dependencies]
polars = { version = "x", features = ["lazy", ...]}

Reading & writing

Polars supports reading and writing for common file formats (e.g. csv, json, parquet), cloud storage (S3, Azure Blob, BigQuery) and databases (e.g. postgres, mysql). Below we show the concept of reading and writing to disk.

DataFrame

import polars as pl
from datetime import datetime

df = pl.DataFrame(
    {
        "integer": [1, 2, 3],
        "date": [
            datetime(2025, 1, 1),
            datetime(2025, 1, 2),
            datetime(2025, 1, 3),
        ],
        "float": [4.0, 5.0, 6.0],
        "string": ["a", "b", "c"],
    }
)

print(df)

DataFrame

use std::fs::File;

use chrono::prelude::*;
use polars::prelude::*;

let mut df: DataFrame = df!(
    "integer" => &[1, 2, 3],
    "date" => &[
            NaiveDate::from_ymd_opt(2025, 1, 1).unwrap().and_hms_opt(0, 0, 0).unwrap(),
            NaiveDate::from_ymd_opt(2025, 1, 2).unwrap().and_hms_opt(0, 0, 0).unwrap(),
            NaiveDate::from_ymd_opt(2025, 1, 3).unwrap().and_hms_opt(0, 0, 0).unwrap(),
    ],
    "float" => &[4.0, 5.0, 6.0],
    "string" => &["a", "b", "c"],
)
.unwrap();
println!("{}", df);

shape: (3, 4)
┌─────────┬─────────────────────┬───────┬────────┐
│ integer ┆ date                ┆ float ┆ string │
│ ---     ┆ ---                 ┆ ---   ┆ ---    │
│ i64     ┆ datetime[μs]        ┆ f64   ┆ str    │
╞═════════╪═════════════════════╪═══════╪════════╡
│ 1       ┆ 2025-01-01 00:00:00 ┆ 4.0   ┆ a      │
│ 2       ┆ 2025-01-02 00:00:00 ┆ 5.0   ┆ b      │
│ 3       ┆ 2025-01-03 00:00:00 ┆ 6.0   ┆ c      │
└─────────┴─────────────────────┴───────┴────────┘

In the example below we write the DataFrame to a csv file called output.csv. After that, we read it back using read_csv and then print the result for inspection.

read_csv · write_csv

df.write_csv("docs/data/output.csv")
df_csv = pl.read_csv("docs/data/output.csv")
print(df_csv)

CsvReader · CsvWriter · Available on feature csv

let mut file = File::create("docs/data/output.csv").expect("could not create file");
CsvWriter::new(&mut file)
    .include_header(true)
    .with_separator(b',')
    .finish(&mut df)?;
let df_csv = CsvReadOptions::default()
    .with_infer_schema_length(None)
    .with_has_header(true)
    .try_into_reader_with_file_path(Some("docs/data/output.csv".into()))?
    .finish()?;
println!("{}", df_csv);

shape: (3, 4)
┌─────────┬────────────────────────────┬───────┬────────┐
│ integer ┆ date                       ┆ float ┆ string │
│ ---     ┆ ---                        ┆ ---   ┆ ---    │
│ i64     ┆ str                        ┆ f64   ┆ str    │
╞═════════╪════════════════════════════╪═══════╪════════╡
│ 1       ┆ 2025-01-01T00:00:00.000000 ┆ 4.0   ┆ a      │
│ 2       ┆ 2025-01-02T00:00:00.000000 ┆ 5.0   ┆ b      │
│ 3       ┆ 2025-01-03T00:00:00.000000 ┆ 6.0   ┆ c      │
└─────────┴────────────────────────────┴───────┴────────┘

For more examples on the CSV file format and other data formats, start with the IO section of the user guide.

Expressions

Expressions are the core strength of Polars. The expressions offer a modular structure that allows you to combine simple concepts into complex queries. Below we cover the basic components that serve as building blocks (or in Polars terminology contexts) for all your queries:

  • select
  • filter
  • with_columns
  • group_by

To learn more about expressions and the context in which they operate, see the user guide sections: Contexts and Expressions.

Select

To select a column we need to do two things:

  1. Define the DataFrame we want the data from.
  2. Select the data that we need.

In the example below you see that we select col('*'). The asterisk stands for all columns.

select

df.select(pl.col("*"))

select

let out = df.clone().lazy().select([col("*")]).collect()?;
println!("{}", out);

shape: (5, 4)
┌─────┬──────────┬─────────────────────┬───────┐
│ a   ┆ b        ┆ c                   ┆ d     │
│ --- ┆ ---      ┆ ---                 ┆ ---   │
│ i64 ┆ f64      ┆ datetime[μs]        ┆ f64   │
╞═════╪══════════╪═════════════════════╪═══════╡
│ 0   ┆ 0.418342 ┆ 2025-12-01 00:00:00 ┆ 1.0   │
│ 1   ┆ 0.178213 ┆ 2025-12-02 00:00:00 ┆ 2.0   │
│ 2   ┆ 0.08584  ┆ 2025-12-03 00:00:00 ┆ NaN   │
│ 3   ┆ 0.811393 ┆ 2025-12-04 00:00:00 ┆ -42.0 │
│ 4   ┆ 0.089994 ┆ 2025-12-05 00:00:00 ┆ null  │
└─────┴──────────┴─────────────────────┴───────┘

You can also specify the specific columns that you want to return. There are two ways to do this. The first option is to pass the column names, as seen below.

select

df.select(pl.col("a", "b"))

select

let out = df.clone().lazy().select([col("a"), col("b")]).collect()?;
println!("{}", out);

shape: (5, 2)
┌─────┬──────────┐
│ a   ┆ b        │
│ --- ┆ ---      │
│ i64 ┆ f64      │
╞═════╪══════════╡
│ 0   ┆ 0.418342 │
│ 1   ┆ 0.178213 │
│ 2   ┆ 0.08584  │
│ 3   ┆ 0.811393 │
│ 4   ┆ 0.089994 │
└─────┴──────────┘

Follow these links to other parts of the user guide to learn more about basic operations or column selections.

Filter

The filter option allows us to create a subset of the DataFrame. We use the same DataFrame as earlier and we filter between two specified dates.

filter

df.filter(
    pl.col("c").is_between(datetime(2025, 12, 2), datetime(2025, 12, 3)),
)

filter

let start_date = NaiveDate::from_ymd_opt(2025, 12, 2)
    .unwrap()
    .and_hms_opt(0, 0, 0)
    .unwrap();
let end_date = NaiveDate::from_ymd_opt(2025, 12, 3)
    .unwrap()
    .and_hms_opt(0, 0, 0)
    .unwrap();
let out = df
    .clone()
    .lazy()
    .filter(
        col("c")
            .gt_eq(lit(start_date))
            .and(col("c").lt_eq(lit(end_date))),
    )
    .collect()?;
println!("{}", out);

shape: (2, 4)
┌─────┬──────────┬─────────────────────┬─────┐
│ a   ┆ b        ┆ c                   ┆ d   │
│ --- ┆ ---      ┆ ---                 ┆ --- │
│ i64 ┆ f64      ┆ datetime[μs]        ┆ f64 │
╞═════╪══════════╪═════════════════════╪═════╡
│ 1   ┆ 0.178213 ┆ 2025-12-02 00:00:00 ┆ 2.0 │
│ 2   ┆ 0.08584  ┆ 2025-12-03 00:00:00 ┆ NaN │
└─────┴──────────┴─────────────────────┴─────┘

With filter you can also create more complex filters that include multiple columns.

filter

df.filter((pl.col("a") <= 3) & (pl.col("d").is_not_nan()))

filter

let out = df
    .clone()
    .lazy()
    .filter(col("a").lt_eq(3).and(col("d").is_not_null()))
    .collect()?;
println!("{}", out);

shape: (3, 4)
┌─────┬──────────┬─────────────────────┬───────┐
│ a   ┆ b        ┆ c                   ┆ d     │
│ --- ┆ ---      ┆ ---                 ┆ ---   │
│ i64 ┆ f64      ┆ datetime[μs]        ┆ f64   │
╞═════╪══════════╪═════════════════════╪═══════╡
│ 0   ┆ 0.418342 ┆ 2025-12-01 00:00:00 ┆ 1.0   │
│ 1   ┆ 0.178213 ┆ 2025-12-02 00:00:00 ┆ 2.0   │
│ 3   ┆ 0.811393 ┆ 2025-12-04 00:00:00 ┆ -42.0 │
└─────┴──────────┴─────────────────────┴───────┘

Add columns

with_columns allows you to create new columns for your analyses. We create two new columns e and b+42. First we sum all values from column b and store the results in column e. After that we add 42 to the values of b. Creating a new column b+42 to store these results.

with_columns

df.with_columns(pl.col("b").sum().alias("e"), (pl.col("b") + 42).alias("b+42"))

with_columns

let out = df
    .clone()
    .lazy()
    .with_columns([
        col("b").sum().alias("e"),
        (col("b") + lit(42)).alias("b+42"),
    ])
    .collect()?;
println!("{}", out);

shape: (5, 6)
┌─────┬──────────┬─────────────────────┬───────┬──────────┬───────────┐
│ a   ┆ b        ┆ c                   ┆ d     ┆ e        ┆ b+42      │
│ --- ┆ ---      ┆ ---                 ┆ ---   ┆ ---      ┆ ---       │
│ i64 ┆ f64      ┆ datetime[μs]        ┆ f64   ┆ f64      ┆ f64       │
╞═════╪══════════╪═════════════════════╪═══════╪══════════╪═══════════╡
│ 0   ┆ 0.418342 ┆ 2025-12-01 00:00:00 ┆ 1.0   ┆ 1.583783 ┆ 42.418342 │
│ 1   ┆ 0.178213 ┆ 2025-12-02 00:00:00 ┆ 2.0   ┆ 1.583783 ┆ 42.178213 │
│ 2   ┆ 0.08584  ┆ 2025-12-03 00:00:00 ┆ NaN   ┆ 1.583783 ┆ 42.08584  │
│ 3   ┆ 0.811393 ┆ 2025-12-04 00:00:00 ┆ -42.0 ┆ 1.583783 ┆ 42.811393 │
│ 4   ┆ 0.089994 ┆ 2025-12-05 00:00:00 ┆ null  ┆ 1.583783 ┆ 42.089994 │
└─────┴──────────┴─────────────────────┴───────┴──────────┴───────────┘

Group by

We will create a new DataFrame for the Group by functionality. This new DataFrame will include several 'groups' that we want to group by.

DataFrame

df2 = pl.DataFrame(
    {
        "x": range(8),
        "y": ["A", "A", "A", "B", "B", "C", "X", "X"],
    }
)

DataFrame

let df2: DataFrame = df!("x" => 0..8,
    "y"=> &["A", "A", "A", "B", "B", "C", "X", "X"],
)
.expect("should not fail");
println!("{}", df2);

shape: (8, 2)
┌─────┬─────┐
│ x   ┆ y   │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪═════╡
│ 0   ┆ A   │
│ 1   ┆ A   │
│ 2   ┆ A   │
│ 3   ┆ B   │
│ 4   ┆ B   │
│ 5   ┆ C   │
│ 6   ┆ X   │
│ 7   ┆ X   │
└─────┴─────┘

group_by

df2.group_by("y", maintain_order=True).len()

group_by

let out = df2.clone().lazy().group_by(["y"]).agg([len()]).collect()?;
println!("{}", out);

shape: (4, 2)
┌─────┬─────┐
│ y   ┆ len │
│ --- ┆ --- │
│ str ┆ u32 │
╞═════╪═════╡
│ A   ┆ 3   │
│ B   ┆ 2   │
│ C   ┆ 1   │
│ X   ┆ 2   │
└─────┴─────┘

group_by

df2.group_by("y", maintain_order=True).agg(
    pl.col("*").count().alias("count"),
    pl.col("*").sum().alias("sum"),
)

group_by

let out = df2
    .clone()
    .lazy()
    .group_by(["y"])
    .agg([col("*").count().alias("count"), col("*").sum().alias("sum")])
    .collect()?;
println!("{}", out);

shape: (4, 3)
┌─────┬───────┬─────┐
│ y   ┆ count ┆ sum │
│ --- ┆ ---   ┆ --- │
│ str ┆ u32   ┆ i64 │
╞═════╪═══════╪═════╡
│ A   ┆ 3     ┆ 3   │
│ B   ┆ 2     ┆ 7   │
│ C   ┆ 1     ┆ 5   │
│ X   ┆ 2     ┆ 13  │
└─────┴───────┴─────┘

Combination

Below are some examples on how to combine operations to create the DataFrame you require.

select · with_columns

df_x = df.with_columns((pl.col("a") * pl.col("b")).alias("a * b")).select(
    pl.all().exclude(["c", "d"])
)

print(df_x)

select · with_columns

let out = df
    .clone()
    .lazy()
    .with_columns([(col("a") * col("b")).alias("a * b")])
    .select([col("*").exclude(["c", "d"])])
    .collect()?;
println!("{}", out);

shape: (5, 3)
┌─────┬──────────┬──────────┐
│ a   ┆ b        ┆ a * b    │
│ --- ┆ ---      ┆ ---      │
│ i64 ┆ f64      ┆ f64      │
╞═════╪══════════╪══════════╡
│ 0   ┆ 0.418342 ┆ 0.0      │
│ 1   ┆ 0.178213 ┆ 0.178213 │
│ 2   ┆ 0.08584  ┆ 0.17168  │
│ 3   ┆ 0.811393 ┆ 2.43418  │
│ 4   ┆ 0.089994 ┆ 0.359977 │
└─────┴──────────┴──────────┘

select · with_columns

df_y = df.with_columns((pl.col("a") * pl.col("b")).alias("a * b")).select(
    pl.all().exclude("d")
)

print(df_y)

select · with_columns

let out = df
    .clone()
    .lazy()
    .with_columns([(col("a") * col("b")).alias("a * b")])
    .select([col("*").exclude(["d"])])
    .collect()?;
println!("{}", out);

shape: (5, 4)
┌─────┬──────────┬─────────────────────┬──────────┐
│ a   ┆ b        ┆ c                   ┆ a * b    │
│ --- ┆ ---      ┆ ---                 ┆ ---      │
│ i64 ┆ f64      ┆ datetime[μs]        ┆ f64      │
╞═════╪══════════╪═════════════════════╪══════════╡
│ 0   ┆ 0.418342 ┆ 2025-12-01 00:00:00 ┆ 0.0      │
│ 1   ┆ 0.178213 ┆ 2025-12-02 00:00:00 ┆ 0.178213 │
│ 2   ┆ 0.08584  ┆ 2025-12-03 00:00:00 ┆ 0.17168  │
│ 3   ┆ 0.811393 ┆ 2025-12-04 00:00:00 ┆ 2.43418  │
│ 4   ┆ 0.089994 ┆ 2025-12-05 00:00:00 ┆ 0.359977 │
└─────┴──────────┴─────────────────────┴──────────┘

Combining DataFrames

There are two ways DataFrames can be combined depending on the use case: join and concat.

Join

Polars supports all types of join (e.g. left, right, inner, outer). Let's have a closer look on how to join two DataFrames into a single DataFrame. Our two DataFrames both have an 'id'-like column: a and x. We can use those columns to join the DataFrames in this example.

join

df = pl.DataFrame(
    {
        "a": range(8),
        "b": np.random.rand(8),
        "d": [1.0, 2.0, float("nan"), float("nan"), 0.0, -5.0, -42.0, None],
    }
)

df2 = pl.DataFrame(
    {
        "x": range(8),
        "y": ["A", "A", "A", "B", "B", "C", "X", "X"],
    }
)
joined = df.join(df2, left_on="a", right_on="x")
print(joined)

join

use rand::Rng;
let mut rng = rand::thread_rng();

let df: DataFrame = df!(
    "a" => 0..8,
    "b"=> (0..8).map(|_| rng.gen::<f64>()).collect::<Vec<f64>>(),
    "d"=> [Some(1.0), Some(2.0), Some(f64::NAN), Some(f64::NAN), Some(0.0), Some(-5.0), Some(-42.), None]
)
.unwrap();
let df2: DataFrame = df!(
    "x" => 0..8,
    "y"=> &["A", "A", "A", "B", "B", "C", "X", "X"],
)
.unwrap();
let joined = df.join(&df2, ["a"], ["x"], JoinType::Left.into())?;
println!("{}", joined);

shape: (8, 4)
┌─────┬──────────┬───────┬─────┐
│ a   ┆ b        ┆ d     ┆ y   │
│ --- ┆ ---      ┆ ---   ┆ --- │
│ i64 ┆ f64      ┆ f64   ┆ str │
╞═════╪══════════╪═══════╪═════╡
│ 0   ┆ 0.997102 ┆ 1.0   ┆ A   │
│ 1   ┆ 0.265668 ┆ 2.0   ┆ A   │
│ 2   ┆ 0.884949 ┆ NaN   ┆ A   │
│ 3   ┆ 0.330684 ┆ NaN   ┆ B   │
│ 4   ┆ 0.005853 ┆ 0.0   ┆ B   │
│ 5   ┆ 0.57128  ┆ -5.0  ┆ C   │
│ 6   ┆ 0.120339 ┆ -42.0 ┆ X   │
│ 7   ┆ 0.918111 ┆ null  ┆ X   │
└─────┴──────────┴───────┴─────┘

To see more examples with other types of joins, see the Transformations section in the user guide.

Concat

We can also concatenate two DataFrames. Vertical concatenation will make the DataFrame longer. Horizontal concatenation will make the DataFrame wider. Below you can see the result of an horizontal concatenation of our two DataFrames.

hstack

stacked = df.hstack(df2)
print(stacked)

hstack

let stacked = df.hstack(df2.get_columns())?;
println!("{}", stacked);

shape: (8, 5)
┌─────┬──────────┬───────┬─────┬─────┐
│ a   ┆ b        ┆ d     ┆ x   ┆ y   │
│ --- ┆ ---      ┆ ---   ┆ --- ┆ --- │
│ i64 ┆ f64      ┆ f64   ┆ i64 ┆ str │
╞═════╪══════════╪═══════╪═════╪═════╡
│ 0   ┆ 0.997102 ┆ 1.0   ┆ 0   ┆ A   │
│ 1   ┆ 0.265668 ┆ 2.0   ┆ 1   ┆ A   │
│ 2   ┆ 0.884949 ┆ NaN   ┆ 2   ┆ A   │
│ 3   ┆ 0.330684 ┆ NaN   ┆ 3   ┆ B   │
│ 4   ┆ 0.005853 ┆ 0.0   ┆ 4   ┆ B   │
│ 5   ┆ 0.57128  ┆ -5.0  ┆ 5   ┆ C   │
│ 6   ┆ 0.120339 ┆ -42.0 ┆ 6   ┆ X   │
│ 7   ┆ 0.918111 ┆ null  ┆ 7   ┆ X   │
└─────┴──────────┴───────┴─────┴─────┘