Databases
Read from a database
Polars can read from a database using the pl.read_database_uri
and pl.read_database
functions.
Difference between read_database_uri
and read_database
Use pl.read_database_uri
if you want to specify the database connection with a connection string called a uri
. For example, the following snippet shows a query to read all columns from the foo
table in a Postgres database where we use the uri
to connect:
import polars as pl
uri = "postgresql://username:password@server:port/database"
query = "SELECT * FROM foo"
pl.read_database_uri(query=query, uri=uri)
On the other hand, use pl.read_database
if you want to connect via a connection engine created with a library like SQLAlchemy.
import polars as pl
from sqlalchemy import create_engine
conn = create_engine(f"sqlite:///test.db")
query = "SELECT * FROM foo"
pl.read_database(query=query, connection=conn.connect())
Note that pl.read_database_uri
is likely to be faster than pl.read_database
if you are using a SQLAlchemy or DBAPI2 connection as these connections may load the data row-wise into Python before copying the data again to the column-wise Apache Arrow format.
Engines
Polars doesn't manage connections and data transfer from databases by itself. Instead, external libraries (known as engines) handle this.
When using pl.read_database
, you specify the engine when you create the connection object. When using pl.read_database_uri
, you can specify one of two engines to read from the database:
- ConnectorX and
- ADBC
Both engines have native support for Apache Arrow and so can read data directly into a Polars DataFrame
without copying the data.
ConnectorX
ConnectorX is the default engine and supports numerous databases including Postgres, Mysql, SQL Server and Redshift. ConnectorX is written in Rust and stores data in Arrow format to allow for zero-copy to Polars.
To read from one of the supported databases with ConnectorX
you need to activate the additional dependency ConnectorX
when installing Polars or install it manually with
$ pip install connectorx
ADBC
ADBC (Arrow Database Connectivity) is an engine supported by the Apache Arrow project. ADBC aims to be both an API standard for connecting to databases and libraries implementing this standard in a range of languages.
It is still early days for ADBC so support for different databases is limited. At present, drivers for ADBC are only available for Postgres, SQLite and Snowflake. To install ADBC, you need to install the driver for your database. For example, to install the driver for SQLite, you run:
$ pip install adbc-driver-sqlite
As ADBC is not the default engine, you must specify the engine as an argument to pl.read_database_uri
.
uri = "postgresql://username:password@server:port/database"
query = "SELECT * FROM foo"
pl.read_database_uri(query=query, uri=uri, engine="adbc")
Write to a database
We can write to a database with Polars using the pl.write_database
function.
Engines
As with reading from a database above, Polars uses an engine to write to a database. The currently supported engines are:
- SQLAlchemy and
- Arrow Database Connectivity (ADBC)
SQLAlchemy
With the default engine SQLAlchemy you can write to any database supported by SQLAlchemy. To use this engine you need to install SQLAlchemy and Pandas
$ pip install SQLAlchemy pandas
In this example, we write the DataFrame
to a table called records
in the database
uri = "postgresql://username:password@server:port/database"
df = pl.DataFrame({"foo": [1, 2, 3]})
df.write_database(table_name="records", connection=uri)
In the SQLAlchemy approach, Polars converts the DataFrame
to a Pandas DataFrame
backed by PyArrow and then uses SQLAlchemy methods on a Pandas DataFrame
to write to the database.
ADBC
ADBC can also be used to write to a database. Writing is supported for the same databases that support reading with ADBC. As shown above, you need to install the appropriate ADBC driver for your database.
uri = "postgresql://username:password@server:port/database"
df = pl.DataFrame({"foo": [1, 2, 3]})
df.write_database(table_name="records", connection=uri, engine="adbc")