polars.read_database#

Read the results of a SQL query into a DataFrame.

Parameters:

query

Raw SQL query (or queries).

connection

A connectorx or ADBC connection URI string that starts with the backend’s driver name, for example:

“postgresql://user:pass@server:port/database”
“snowflake://user:pass@account/database/schema?warehouse=warehouse&role=role”

partition_on

The column on which to partition the result (connectorx).

partition_range

The value range of the partition column (connectorx).

partition_num

How many partitions to generate (connectorx).

protocol

Backend-specific transfer protocol directive (connectorx); see connectorx documentation for more details.

engine{‘connectorx’, ‘adbc’}

Selects the engine used for reading the database (defaulting to connectorx):

'connectorx' Supports a range of databases, such as PostgreSQL, Redshift, MySQL, MariaDB, Clickhouse, Oracle, BigQuery, SQL Server, and so on. For an up-to-date list please see the connectorx docs:
- https://github.com/sfu-db/connector-x#supported-sources–destinations
'adbc' Currently there is limited support for this engine, with a relatively small number of drivers available, most of which are still in development. For an up-to-date list of drivers please see the ADBC docs:
- https://arrow.apache.org/adbc/

Notes

For connectorx, ensure that you have connectorx>=0.3.1. The documentation is available here.

For adbc you will need to have installed pyarrow and the ADBC driver associated with the backend you are connecting to, eg: adbc-driver-postgresql.

Examples

Read a DataFrame from a SQL query using a single thread:

>>> uri = "postgresql://username:password@server:port/database"
>>> query = "SELECT * FROM lineitem"
>>> pl.read_database(query, uri)  

Read a DataFrame in parallel using 10 threads by automatically partitioning the provided SQL on the partition column:

>>> uri = "postgresql://username:password@server:port/database"
>>> query = "SELECT * FROM lineitem"
>>> pl.read_database(
...     query,
...     uri,
...     partition_on="partition_col",
...     partition_num=10,
...     engine="connectorx",
... )  

Read a DataFrame in parallel using 2 threads by explicitly providing two SQL queries:

>>> uri = "postgresql://username:password@server:port/database"
>>> queries = [
...     "SELECT * FROM lineitem WHERE partition_col <= 10",
...     "SELECT * FROM lineitem WHERE partition_col > 10",
... ]
>>> pl.read_database(queries, uri, engine="connectorx")  

Read data from Snowflake using the ADBC driver:

>>> df = pl.read_database(
...     "SELECT * FROM test_table",
...     "snowflake://user:pass@company-org/testdb/public?warehouse=test&role=myrole",
...     engine="adbc",
... )