polars.read_database_uri#
- polars.read_database_uri(
- query: list[str] | str,
- uri: str,
- *,
- partition_on: str | None = None,
- partition_range: tuple[int, int] | None = None,
- partition_num: int | None = None,
- protocol: str | None = None,
- engine: DbReadEngine | None = None,
- schema_overrides: SchemaDict | None = None,
Read the results of a SQL query into a DataFrame, given a URI.
- Parameters:
- query
Raw SQL query (or queries).
- uri
A connectorx or ADBC connection URI string that starts with the backend’s driver name, for example:
“postgresql://user:pass@server:port/database”
“snowflake://user:pass@account/database/schema?warehouse=warehouse&role=role”
- partition_on
The column on which to partition the result (connectorx).
- partition_range
The value range of the partition column (connectorx).
- partition_num
How many partitions to generate (connectorx).
- protocol
Backend-specific transfer protocol directive (connectorx); see connectorx documentation for more details.
- engine{‘connectorx’, ‘adbc’}
Selects the engine used for reading the database (defaulting to connectorx):
'connectorx'
Supports a range of databases, such as PostgreSQL, Redshift, MySQL, MariaDB, Clickhouse, Oracle, BigQuery, SQL Server, and so on. For an up-to-date list please see the connectorx docs:'adbc'
Currently there is limited support for this engine, with a relatively small number of drivers available, most of which are still in development. For an up-to-date list of drivers please see the ADBC docs:
- schema_overrides
A dictionary mapping column names to dtypes, used to override the schema given in the data returned by the query.
See also
read_database
Create a DataFrame from a SQL query using a connection object.
Notes
For
connectorx
, ensure that you haveconnectorx>=0.3.2
. The documentation is available here.For
adbc
you will need to have installedpyarrow
and the ADBC driver associated with the backend you are connecting to, eg:adbc-driver-postgresql
.Examples
Create a DataFrame from a SQL query using a single thread:
>>> uri = "postgresql://username:password@server:port/database" >>> query = "SELECT * FROM lineitem" >>> pl.read_database_uri(query, uri)
Create a DataFrame in parallel using 10 threads by automatically partitioning the provided SQL on the partition column:
>>> uri = "postgresql://username:password@server:port/database" >>> query = "SELECT * FROM lineitem" >>> pl.read_database_uri( ... query, ... uri, ... partition_on="partition_col", ... partition_num=10, ... engine="connectorx", ... )
Create a DataFrame in parallel using 2 threads by explicitly providing two SQL queries:
>>> uri = "postgresql://username:password@server:port/database" >>> queries = [ ... "SELECT * FROM lineitem WHERE partition_col <= 10", ... "SELECT * FROM lineitem WHERE partition_col > 10", ... ] >>> pl.read_database_uri(queries, uri, engine="connectorx")
Read data from Snowflake using the ADBC driver:
>>> df = pl.read_database_uri( ... "SELECT * FROM test_table", ... "snowflake://user:pass@company-org/testdb/public?warehouse=test&role=myrole", ... engine="adbc", ... )