Visualization
Data in a Polars DataFrame
can be visualized using common visualization libraries.
We illustrate plotting capabilities using the Iris dataset. We read a CSV and then plot one column against another, colored by a yet another column.
import polars as pl
path = "docs/assets/data/iris.csv"
df = pl.read_csv(path)
print(df)
shape: (150, 5)
┌──────────────┬─────────────┬──────────────┬─────────────┬───────────┐
│ sepal_length ┆ sepal_width ┆ petal_length ┆ petal_width ┆ species │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 ┆ f64 ┆ str │
╞══════════════╪═════════════╪══════════════╪═════════════╪═══════════╡
│ 5.1 ┆ 3.5 ┆ 1.4 ┆ 0.2 ┆ Setosa │
│ 4.9 ┆ 3.0 ┆ 1.4 ┆ 0.2 ┆ Setosa │
│ 4.7 ┆ 3.2 ┆ 1.3 ┆ 0.2 ┆ Setosa │
│ 4.6 ┆ 3.1 ┆ 1.5 ┆ 0.2 ┆ Setosa │
│ 5.0 ┆ 3.6 ┆ 1.4 ┆ 0.2 ┆ Setosa │
│ … ┆ … ┆ … ┆ … ┆ … │
│ 6.7 ┆ 3.0 ┆ 5.2 ┆ 2.3 ┆ Virginica │
│ 6.3 ┆ 2.5 ┆ 5.0 ┆ 1.9 ┆ Virginica │
│ 6.5 ┆ 3.0 ┆ 5.2 ┆ 2.0 ┆ Virginica │
│ 6.2 ┆ 3.4 ┆ 5.4 ┆ 2.3 ┆ Virginica │
│ 5.9 ┆ 3.0 ┆ 5.1 ┆ 1.8 ┆ Virginica │
└──────────────┴─────────────┴──────────────┴─────────────┴───────────┘
Built-in plotting with Altair
Polars has a plot
method to create plots using Altair:
(
df.plot.point(
x="sepal_length",
y="sepal_width",
color="species",
)
.properties(width=500)
.configure_scale(zero=False)
)
This is shorthand for:
import altair as alt
(
alt.Chart(df).mark_point(tooltip=True).encode(
x="sepal_length",
y="sepal_width",
color="species",
)
.properties(width=500)
.configure_scale(zero=False)
)
and is only provided for convenience, and to signal that Altair is known to work well with Polars.
hvPlot
If you import hvplot.polars
, then it registers a hvplot
method which you can use to create
interactive plots using hvPlot.
import hvplot.polars
df.hvplot.scatter(
x="sepal_width",
y="sepal_length",
by="species",
width=650,
)
Matplotlib
To create a scatter plot we can pass columns of a DataFrame
directly to Matplotlib as a Series
for each column. Matplotlib does not have explicit support for Polars objects but can accept a
Polars Series
by converting it to a NumPy array (which is zero-copy for numeric data without null
values).
Note that because the column 'species'
isn't numeric, we need to first convert it to numeric
values so that it can be passed as an argument to c
.
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.scatter(
x=df["sepal_width"],
y=df["sepal_length"],
c=df["species"].cast(pl.Categorical).to_physical(),
)
Seaborn and Plotly
Seaborn and Plotly can accept a Polars
DataFrame
by leveraging the
dataframe interchange protocol, which offers zero-copy
conversion where possible. Note that the protocol does not support all Polars data types (e.g.
List
) so your mileage may vary here.
Seaborn
import seaborn as sns
sns.scatterplot(
df,
x="sepal_width",
y="sepal_length",
hue="species",
)
Plotly
import plotly.express as px
px.scatter(
df,
x="sepal_width",
y="sepal_length",
color="species",
width=650,
)