polars.read_database_uri#

polars.read_database_uri(
query: list[str] | str,
uri: str,
*,
partition_on: str | None = None,
partition_range: tuple[int, int] | None = None,
partition_num: int | None = None,
protocol: str | None = None,
engine: DbReadEngine | None = None,
schema_overrides: SchemaDict | None = None,
execute_options: dict[str, Any] | None = None,
) DataFrame[source]#

Read the results of a SQL query into a DataFrame, given a URI.

Parameters:
query

Raw SQL query (or queries).

uri

A connectorx or ADBC connection URI string that starts with the backend’s driver name, for example:

  • “postgresql://user:pass@server:port/database”

  • “snowflake://user:pass@account/database/schema?warehouse=warehouse&role=role”

The caller is responsible for escaping any special characters in the string, which will be passed “as-is” to the underlying engine (this is most often required when coming across special characters in the password).

partition_on

The column on which to partition the result (connectorx).

partition_range

The value range of the partition column (connectorx).

partition_num

How many partitions to generate (connectorx).

protocol

Backend-specific transfer protocol directive (connectorx); see connectorx documentation for more details.

engine{‘connectorx’, ‘adbc’}

Selects the engine used for reading the database (defaulting to connectorx):

  • 'connectorx' Supports a range of databases, such as PostgreSQL, Redshift, MySQL, MariaDB, Clickhouse, Oracle, BigQuery, SQL Server, and so on. For an up-to-date list please see the connectorx docs: https://github.com/sfu-db/connector-x#supported-sources–destinations

  • 'adbc' Currently there is limited support for this engine, with a relatively small number of drivers available, most of which are still in development. For an up-to-date list of drivers please see the ADBC docs: https://arrow.apache.org/adbc/

schema_overrides

A dictionary mapping column names to dtypes, used to override the schema given in the data returned by the query.

execute_options

These options will be passed to the underlying query execution method as kwargs. Note that connectorx does not support this parameter and ADBC currently only supports positional ‘qmark’ style parameterization.

See also

read_database

Create a DataFrame from a SQL query using a connection object.

Notes

For connectorx, ensure that you have connectorx>=0.3.2. The documentation is available here.

For adbc you will need to have installed pyarrow and the ADBC driver associated with the backend you are connecting to, eg: adbc-driver-postgresql.

If your password contains special characters, you will need to escape them. This will usually require the use of a URL-escaping function, for example:

>>> from urllib.parse import quote, quote_plus
>>> quote_plus("pass word?")
'pass+word%3F'
>>> quote("pass word?")
'pass%20word%3F'

Examples

Create a DataFrame from a SQL query using a single thread:

>>> uri = "postgresql://username:password@server:port/database"
>>> query = "SELECT * FROM lineitem"
>>> pl.read_database_uri(query, uri)  

Create a DataFrame in parallel using 10 threads by automatically partitioning the provided SQL on the partition column:

>>> uri = "postgresql://username:password@server:port/database"
>>> query = "SELECT * FROM lineitem"
>>> pl.read_database_uri(
...     query,
...     uri,
...     partition_on="partition_col",
...     partition_num=10,
...     engine="connectorx",
... )  

Create a DataFrame in parallel using 2 threads by explicitly providing two SQL queries:

>>> uri = "postgresql://username:password@server:port/database"
>>> queries = [
...     "SELECT * FROM lineitem WHERE partition_col <= 10",
...     "SELECT * FROM lineitem WHERE partition_col > 10",
... ]
>>> pl.read_database_uri(queries, uri, engine="connectorx")  

Read data from Snowflake using the ADBC driver:

>>> df = pl.read_database_uri(
...     "SELECT * FROM test_table",
...     "snowflake://user:pass@company-org/testdb/public?warehouse=test&role=myrole",
...     engine="adbc",
... )  

Pass a single parameter via execute_options into a query using the ADBC driver:

>>> df = pl.read_database_uri(
...     "SELECT * FROM employees WHERE hourly_rate > ?",
...     "sqlite:///:memory:",
...     engine="adbc",
...     execute_options={"parameters": (30,)},
... )  

Or pass multiple parameters:

>>> df = pl.read_database_uri(
...     "SELECT * FROM employees WHERE hourly_rate BETWEEN ? AND ?",
...     "sqlite:///:memory:",
...     engine="adbc",
...     execute_options={"parameters": (40, 20)},
... )