Python API#
Introduction#
There are four primary entry points to the Polars SQL interface, each operating at a
different level of granularity. There is the SQLContext
object,
a top-level polars.sql()
function that operates on the global context,
frame-level DataFrame.sql()
and LazyFrame.sql()
methods, and the
polars.sql_expr()
function that creates native expressions from SQL.
Querying#
SQL queries can be issued against compatible data structures in the current globals, against specific frames, or incorporated into expressions.
Global SQL#
Both SQLContext
and the polars.sql()
function can be used
to execute SQL queries mediated by the Polars execution engine against Polars
DataFrame, LazyFrame, and Series
data, as well as Pandas DataFrame and Series, and
PyArrow Table and RecordBatch objects.
Non-Polars objects are implicitly converted to DataFrame when used in a SQL query; for
PyArrow, and Pandas data that uses PyArrow dtypes, this conversion can often be
zero-copy if the underlying data maps cleanly to a natively-supported dtype.
Example:
import polars as pl
import pandas as pd
polars_df = pl.DataFrame({"a": [1, 2, 3, 4], "b": [4, 5, 6, 7]})
pandas_df = pd.DataFrame({"a": [3, 4, 5, 6], "b": [6, 7, 8, 9]})
polars_series = (polars_df["a"] * 2).rename("c")
pyarrow_table = polars_df.to_arrow()
pl.sql(
"""
SELECT a, b, SUM(c) AS c_total FROM (
SELECT * FROM polars_df -- polars frame
UNION ALL SELECT * FROM pandas_df -- pandas frame
UNION ALL SELECT * FROM pyarrow_table -- pyarrow table
) all_data
INNER JOIN polars_series
ON polars_series.c = all_data.b -- polars series
GROUP BY "a", "b"
ORDER BY "a", "b"
"""
).collect()
# shape: (3, 3)
# ┌─────┬─────┬─────────┐
# │ a ┆ b ┆ c_total │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ i64 ┆ i64 │
# ╞═════╪═════╪═════════╡
# │ 1 ┆ 4 ┆ 8 │
# │ 3 ┆ 6 ┆ 18 │
# │ 5 ┆ 8 ┆ 8 │
# └─────┴─────┴─────────┘
See also
Frame SQL#
Executes SQL directly against the specific underlying eager/lazy frame, referencing it as “self”; returns a new frame representing the query result.
Example:
import polars as pl
df = pl.DataFrame({
"a": [1, 2, 3],
"b": [4, 5, 6],
})
df.sql("""
SELECT a::uint4, (b * b) AS bb
FROM self WHERE a != 2
""")
# shape: (2, 2)
# ┌─────┬─────┐
# │ a ┆ bb │
# │ --- ┆ --- │
# │ u32 ┆ i64 │
# ╞═════╪═════╡
# │ 1 ┆ 16 │
# │ 3 ┆ 36 │
# └─────┴─────┘
Expression SQL#
The polars.sql_expr()
function can be used to create native Polars expressions
from SQL fragments.
Example:
import polars as pl
df = pl.DataFrame({
"a": [1, 2, 3],
"b": [4, 5, 6],
})
df.with_columns(
pl.sql_expr("(a * a) + (b::float / 2) AS expr1"),
pl.sql_expr("CONCAT_WS(':',a,b) AS expr2")
)
# shape: (3, 4)
# ┌─────┬─────┬───────┬───────┐
# │ a ┆ b ┆ expr1 ┆ expr2 │
# │ --- ┆ --- ┆ --- ┆ --- │
# │ i64 ┆ i64 ┆ f64 ┆ str │
# ╞═════╪═════╪═══════╪═══════╡
# │ 1 ┆ 4 ┆ 3.0 ┆ 1:4 │
# │ 2 ┆ 5 ┆ 6.5 ┆ 2:5 │
# │ 3 ┆ 6 ┆ 12.0 ┆ 3:6 │
# └─────┴─────┴───────┴───────┘
SQLContext#
Polars provides a dedicated class for querying frame data that offers additional
control over table registration and management of state, and can also be used as
a context manager. This is the SQLContext
object, and it provides all of
the core functionality used by the other SQL functions.
- class polars.SQLContext[source]#
Run SQL queries against DataFrame/LazyFrame data.
- __init__(
- frames: Mapping[str, CompatibleFrameType | None] | None = None,
- *,
- register_globals: bool | int = False,
- eager: bool = False,
- **named_frames: CompatibleFrameType | None,
Initialize a new
SQLContext
.- Parameters:
- frames
A
{name:frame, ...}
mapping which can include Polars frames and pandas DataFrames, Series and pyarrow Table and RecordBatch objects.- register_globals
Register compatible objects (polars DataFrame, LazyFrame, and Series) found in the globals, automatically mapping their variable name to a table name. To register other objects (pandas/pyarrow data) pass them explicitly, or call the
execute_global
classmethod. If given an integer then only the most recent “n” objects found will be registered.- eager
If True, returns execution results as
DataFrame
instead ofLazyFrame
. (Note that the query itself is always executed in lazy-mode; this parameter impacts whetherexecute()
returns an eager or lazy result frame).- **named_frames
Named eager/lazy frames, provided as kwargs.
Examples
>>> lf = pl.LazyFrame({"a": [1, 2, 3], "b": ["x", None, "z"]}) >>> res = pl.SQLContext(frame=lf).execute( ... "SELECT b, a*2 AS two_a FROM frame WHERE b IS NOT NULL" ... ) >>> res.collect() shape: (2, 2) ┌─────┬───────┐ │ b ┆ two_a │ │ --- ┆ --- │ │ str ┆ i64 │ ╞═════╪═══════╡ │ x ┆ 2 │ │ z ┆ 6 │ └─────┴───────┘
Note: can also be used as a context manager.
- __enter__() SQLContext[FrameType] [source]#
Track currently registered tables on scope entry; supports nested scopes.
- __exit__(
- exc_type: type[BaseException] | None,
- exc_val: BaseException | None,
- exc_tb: TracebackType | None,
Unregister any tables created within the given scope on context exit.
See also
Methods#
|
Parse the given SQL query and execute it against the registered frame data. |
|
Immediately execute a SQL query, automatically registering frame globals. |
|
Register a single frame as a table, using the given name. |
|
Register all frames (lazy or eager) found in the current globals scope. |
|
Register multiple eager/lazy frames as tables, using the associated names. |
Return a list of the registered table names. |
|
|
Unregister one or more eager/lazy frames by name. |
Example:
import polars as pl
df1 = pl.DataFrame({"id": [1, 2, 3], "value": [0.1, 0.2, 0.3]})
df2 = pl.DataFrame({"id": [3, 2, 1], "value": [25.6, 53.4, 12.7]})
with pl.SQLContext(df_a=df1, df_b=df2, eager=True) as ctx:
df = ctx.execute("""
SELECT
a.id,
a.value AS value_a,
b.value AS value_b
FROM df_a AS a INNER JOIN df_b AS b USING (id)
ORDER BY id
""")
# shape: (3, 3)
# ┌─────┬─────────┬─────────┐
# │ id ┆ value_a ┆ value_b │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ f64 ┆ f64 │
# ╞═════╪═════════╪═════════╡
# │ 1 ┆ 0.1 ┆ 25.6 │
# │ 2 ┆ 0.2 ┆ 53.4 │
# │ 3 ┆ 0.3 ┆ 12.7 │
# └─────┴─────────┴─────────┘
See also