polars.read_database_uri#
- polars.read_database_uri(
- query: list[str] | str,
- uri: str,
- *,
- partition_on: str | None = None,
- partition_range: tuple[int, int] | None = None,
- partition_num: int | None = None,
- protocol: str | None = None,
- engine: DbReadEngine | None = None,
- schema_overrides: SchemaDict | None = None,
- execute_options: dict[str, Any] | None = None,
Read the results of a SQL query into a DataFrame, given a URI.
- Parameters:
- query
Raw SQL query (or queries).
- uri
A connectorx or ADBC connection URI string that starts with the backend’s driver name, for example:
“postgresql://user:pass@server:port/database”
“snowflake://user:pass@account/database/schema?warehouse=warehouse&role=role”
The caller is responsible for escaping any special characters in the string, which will be passed “as-is” to the underlying engine (this is most often required when coming across special characters in the password).
- partition_on
The column on which to partition the result (connectorx).
- partition_range
The value range of the partition column (connectorx).
- partition_num
How many partitions to generate (connectorx).
- protocol
Backend-specific transfer protocol directive (connectorx); see connectorx documentation for more details.
- engine{‘connectorx’, ‘adbc’}
Selects the engine used for reading the database (defaulting to connectorx):
'connectorx'
Supports a range of databases, such as PostgreSQL, Redshift, MySQL, MariaDB, Clickhouse, Oracle, BigQuery, SQL Server, and so on. For an up-to-date list please see the connectorx docs: https://github.com/sfu-db/connector-x#supported-sources–destinations'adbc'
Currently there is limited support for this engine, with a relatively small number of drivers available, most of which are still in development. For an up-to-date list of drivers please see the ADBC docs: https://arrow.apache.org/adbc/
- schema_overrides
A dictionary mapping column names to dtypes, used to override the schema given in the data returned by the query.
- execute_options
These options will be passed to the underlying query execution method as kwargs. Note that connectorx does not support this parameter.
See also
read_database
Create a DataFrame from a SQL query using a connection object.
Notes
For
connectorx
, ensure that you haveconnectorx>=0.3.2
. The documentation is available here.For
adbc
you will need to have installedpyarrow
and the ADBC driver associated with the backend you are connecting to, eg:adbc-driver-postgresql
.If your password contains special characters, you will need to escape them. This will usually require the use of a URL-escaping function, for example:
>>> from urllib.parse import quote, quote_plus >>> quote_plus("pass word?") 'pass+word%3F' >>> quote("pass word?") 'pass%20word%3F'
Examples
Create a DataFrame from a SQL query using a single thread:
>>> uri = "postgresql://username:password@server:port/database" >>> query = "SELECT * FROM lineitem" >>> pl.read_database_uri(query, uri)
Create a DataFrame in parallel using 10 threads by automatically partitioning the provided SQL on the partition column:
>>> uri = "postgresql://username:password@server:port/database" >>> query = "SELECT * FROM lineitem" >>> pl.read_database_uri( ... query, ... uri, ... partition_on="partition_col", ... partition_num=10, ... engine="connectorx", ... )
Create a DataFrame in parallel using 2 threads by explicitly providing two SQL queries:
>>> uri = "postgresql://username:password@server:port/database" >>> queries = [ ... "SELECT * FROM lineitem WHERE partition_col <= 10", ... "SELECT * FROM lineitem WHERE partition_col > 10", ... ] >>> pl.read_database_uri(queries, uri, engine="connectorx")
Read data from Snowflake using the ADBC driver:
>>> df = pl.read_database_uri( ... "SELECT * FROM test_table", ... "snowflake://user:pass@company-org/testdb/public?warehouse=test&role=myrole", ... engine="adbc", ... )