polars.DataFrame.sql#

DataFrame.sql(query: str, *, table_name: str | None = None) Self[source]#

Execute a SQL query against the DataFrame.

New in version 0.20.24.

Warning

This functionality is considered unstable, although it is close to being considered stable. It may be changed at any point without it being considered a breaking change.

Parameters:
query

SQL query to execute.

table_name

Optionally provide an explicit name for the table that represents the calling frame (the alias “self” will always be registered/available).

See also

SQLContext

Notes

  • The calling frame is automatically registered as a table in the SQL context under the name “self”. All DataFrames and LazyFrames found in the current set of global variables are also registered, using their variable name.

  • More control over registration and execution behaviour is available by using the SQLContext object.

  • The SQL query executes entirely in lazy mode before being collected and returned as a DataFrame.

Examples

>>> from datetime import date
>>> df1 = pl.DataFrame(
...     {
...         "a": [1, 2, 3],
...         "b": ["zz", "yy", "xx"],
...         "c": [date(1999, 12, 31), date(2010, 10, 10), date(2077, 8, 8)],
...     }
... )

Query the DataFrame using SQL:

>>> df1.sql("SELECT c, b FROM self WHERE a > 1")
shape: (2, 2)
┌────────────┬─────┐
│ c          ┆ b   │
│ ---        ┆ --- │
│ date       ┆ str │
╞════════════╪═════╡
│ 2010-10-10 ┆ yy  │
│ 2077-08-08 ┆ xx  │
└────────────┴─────┘

Join two DataFrames using SQL.

>>> df2 = pl.DataFrame({"a": [3, 2, 1], "d": [125, -654, 888]})
>>> df1.sql(
...     '''
...     SELECT self.*, d
...     FROM self
...     INNER JOIN df2 USING (a)
...     WHERE a > 1 AND EXTRACT(year FROM c) < 2050
...     '''
... )
shape: (1, 4)
┌─────┬─────┬────────────┬──────┐
│ a   ┆ b   ┆ c          ┆ d    │
│ --- ┆ --- ┆ ---        ┆ ---  │
│ i64 ┆ str ┆ date       ┆ i64  │
╞═════╪═════╪════════════╪══════╡
│ 2   ┆ yy  ┆ 2010-10-10 ┆ -654 │
└─────┴─────┴────────────┴──────┘

Apply transformations to a DataFrame using SQL, aliasing “self” to “frame”.

>>> df1.sql(
...     query='''
...         SELECT
...             a,
...             (a % 2 == 0) AS a_is_even,
...             CONCAT_WS(':', b, b) AS b_b,
...             EXTRACT(year FROM c) AS year,
...             0::float4 AS "zero",
...         FROM frame
...     ''',
...     table_name="frame",
... )
shape: (3, 5)
┌─────┬───────────┬───────┬──────┬──────┐
│ a   ┆ a_is_even ┆ b_b   ┆ year ┆ zero │
│ --- ┆ ---       ┆ ---   ┆ ---  ┆ ---  │
│ i64 ┆ bool      ┆ str   ┆ i32  ┆ f32  │
╞═════╪═══════════╪═══════╪══════╪══════╡
│ 1   ┆ false     ┆ zz:zz ┆ 1999 ┆ 0.0  │
│ 2   ┆ true      ┆ yy:yy ┆ 2010 ┆ 0.0  │
│ 3   ┆ false     ┆ xx:xx ┆ 2077 ┆ 0.0  │
└─────┴───────────┴───────┴──────┴──────┘