Python API#

Introduction#

There are four primary entry points to the Polars SQL interface, each operating at a different level of granularity. There is the SQLContext object, a top-level polars.sql() function that operates on the global context, frame-level DataFrame.sql() and LazyFrame.sql() methods, and the polars.sql_expr() function that creates native expressions from SQL.

Querying#

SQL queries can be issued against compatible data structures in the current globals, against specific frames, or incorporated into expressions.

Global SQL#

Both SQLContext and the polars.sql() function can be used to execute SQL queries mediated by the Polars execution engine against Polars DataFrame, LazyFrame, and Series data, as well as Pandas DataFrame and Series, and PyArrow Table and RecordBatch objects. Non-Polars objects are implicitly converted to DataFrame when used in a SQL query; for PyArrow, and Pandas data that uses PyArrow dtypes, this conversion can often be zero-copy if the underlying data maps cleanly to a natively-supported dtype.

Example:

import polars as pl
import pandas as pd

polars_df = pl.DataFrame({"a": [1, 2, 3, 4], "b": [4, 5, 6, 7]})
pandas_df = pd.DataFrame({"a": [3, 4, 5, 6], "b": [6, 7, 8, 9]})
polars_series = (polars_df["a"] * 2).rename("c")
pyarrow_table = polars_df.to_arrow()

pl.sql(
    """
    SELECT a, b, SUM(c) AS c_total FROM (
      SELECT * FROM polars_df                  -- polars frame
        UNION ALL SELECT * FROM pandas_df      -- pandas frame
        UNION ALL SELECT * FROM pyarrow_table  -- pyarrow table
    ) all_data
    INNER JOIN polars_series
      ON polars_series.c = all_data.b          -- polars series
    GROUP BY "a", "b"
    ORDER BY "a", "b"
    """
).collect()

# shape: (3, 3)
# ┌─────┬─────┬─────────┐
# │ a   ┆ b   ┆ c_total │
# │ --- ┆ --- ┆ ---     │
# │ i64 ┆ i64 ┆ i64     │
# ╞═════╪═════╪═════════╡
# │ 1   ┆ 4   ┆ 8       │
# │ 3   ┆ 6   ┆ 18      │
# │ 5   ┆ 8   ┆ 8       │
# └─────┴─────┴─────────┘

See also

SQLContext

Frame SQL#

Executes SQL directly against the specific underlying eager/lazy frame, referencing it as “self”; returns a new frame representing the query result.

Example:

import polars as pl

df = pl.DataFrame({
    "a": [1, 2, 3],
    "b": [4, 5, 6],
})
df.sql("""
  SELECT a::uint4, (b * b) AS bb
  FROM self WHERE a != 2
""")

# shape: (2, 2)
# ┌─────┬─────┐
# │ a   ┆ bb  │
# │ --- ┆ --- │
# │ u32 ┆ i64 │
# ╞═════╪═════╡
# │ 1   ┆ 16  │
# │ 3   ┆ 36  │
# └─────┴─────┘

Expression SQL#

The polars.sql_expr() function can be used to create native Polars expressions from SQL fragments.

Example:

import polars as pl

df = pl.DataFrame({
    "a": [1, 2, 3],
    "b": [4, 5, 6],
})
df.with_columns(
    pl.sql_expr("(a * a) + (b::float / 2) AS expr1"),
    pl.sql_expr("CONCAT_WS(':',a,b) AS expr2")
)

# shape: (3, 4)
# ┌─────┬─────┬───────┬───────┐
# │ a   ┆ b   ┆ expr1 ┆ expr2 │
# │ --- ┆ --- ┆ ---   ┆ ---   │
# │ i64 ┆ i64 ┆ f64   ┆ str   │
# ╞═════╪═════╪═══════╪═══════╡
# │ 1   ┆ 4   ┆ 3.0   ┆ 1:4   │
# │ 2   ┆ 5   ┆ 6.5   ┆ 2:5   │
# │ 3   ┆ 6   ┆ 12.0  ┆ 3:6   │
# └─────┴─────┴───────┴───────┘

SQLContext#

Polars provides a dedicated class for querying frame data that offers additional control over table registration and management of state, and can also be used as a context manager. This is the SQLContext object, and it provides all of the core functionality used by the other SQL functions.

class polars.SQLContext[source]#

Run SQL queries against DataFrame/LazyFrame data.

__init__(
frames: Mapping[str, CompatibleFrameType | None] | None = None,
*,
register_globals: bool | int = False,
eager: bool = False,
**named_frames: CompatibleFrameType | None,
) None[source]#

Initialize a new SQLContext.

Parameters:
frames

A {name:frame, ...} mapping which can include Polars frames and pandas DataFrames, Series and pyarrow Table and RecordBatch objects.

register_globals

Register compatible objects (polars DataFrame, LazyFrame, and Series) found in the globals, automatically mapping their variable name to a table name. To register other objects (pandas/pyarrow data) pass them explicitly, or call the execute_global classmethod. If given an integer then only the most recent “n” objects found will be registered.

eager

If True, returns execution results as DataFrame instead of LazyFrame. (Note that the query itself is always executed in lazy-mode; this parameter impacts whether execute() returns an eager or lazy result frame).

**named_frames

Named eager/lazy frames, provided as kwargs.

Examples

>>> lf = pl.LazyFrame({"a": [1, 2, 3], "b": ["x", None, "z"]})
>>> res = pl.SQLContext(frame=lf).execute(
...     "SELECT b, a*2 AS two_a FROM frame WHERE b IS NOT NULL"
... )
>>> res.collect()
shape: (2, 2)
┌─────┬───────┐
│ b   ┆ two_a │
│ --- ┆ ---   │
│ str ┆ i64   │
╞═════╪═══════╡
│ x   ┆ 2     │
│ z   ┆ 6     │
└─────┴───────┘

Note: can also be used as a context manager.

__enter__() SQLContext[FrameType][source]#

Track currently registered tables on scope entry; supports nested scopes.

__exit__(
exc_type: type[BaseException] | None,
exc_val: BaseException | None,
exc_tb: TracebackType | None,
) None[source]#

Unregister any tables created within the given scope on context exit.

See also

unregister

Methods#

SQLContext.execute(query, *[, eager])

Parse the given SQL query and execute it against the registered frame data.

SQLContext.execute_global(query, *[, eager])

Immediately execute a SQL query, automatically registering frame globals.

SQLContext.register(name, frame)

Register a single frame as a table, using the given name.

SQLContext.register_globals([n, all_compatible])

Register all frames (lazy or eager) found in the current globals scope.

SQLContext.register_many([frames])

Register multiple eager/lazy frames as tables, using the associated names.

SQLContext.tables()

Return a list of the registered table names.

SQLContext.unregister(names)

Unregister one or more eager/lazy frames by name.

Example:

import polars as pl

df1 = pl.DataFrame({"id": [1, 2, 3], "value": [0.1, 0.2, 0.3]})
df2 = pl.DataFrame({"id": [3, 2, 1], "value": [25.6, 53.4, 12.7]})

with pl.SQLContext(df_a=df1, df_b=df2, eager=True) as ctx:
    df = ctx.execute("""
      SELECT
        a.id,
        a.value AS value_a,
        b.value AS value_b
      FROM df_a AS a INNER JOIN df_b AS b USING (id)
      ORDER BY id
    """)

    # shape: (3, 3)
    # ┌─────┬─────────┬─────────┐
    # │ id  ┆ value_a ┆ value_b │
    # │ --- ┆ ---     ┆ ---     │
    # │ i64 ┆ f64     ┆ f64     │
    # ╞═════╪═════════╪═════════╡
    # │ 1   ┆ 0.1     ┆ 25.6    │
    # │ 2   ┆ 0.2     ┆ 53.4    │
    # │ 3   ┆ 0.3     ┆ 12.7    │
    # └─────┴─────────┴─────────┘

See also

pl.sql