polars.sql#
- polars.sql(query: str, *, eager: bool = False) DataFrame | LazyFrame [source]#
Execute a SQL query against frames in the global namespace.
New in version 0.20.31.
- Parameters:
- query
SQL query to execute.
- eager
Automatically collect the result and return a DataFrame instead of a LazyFrame.
See also
Notes
The Polars SQL engine can operate against Polars DataFrame, LazyFrame, and Series objects, as well as Pandas DataFrame and Series, PyArrow Table and RecordBatch.
Additional control over registration and execution behaviour is available with the
SQLContext
object.
Examples
>>> lf1 = pl.LazyFrame({"a": [1, 2, 3], "b": [6, 7, 8], "c": ["z", "y", "x"]}) >>> lf2 = pl.LazyFrame({"a": [3, 2, 1], "d": [125, -654, 888]})
Query the LazyFrame using SQL:
>>> lf1.sql("SELECT c, b FROM self WHERE a > 1").collect() shape: (2, 2) ┌─────┬─────┐ │ c ┆ b │ │ --- ┆ --- │ │ str ┆ i64 │ ╞═════╪═════╡ │ y ┆ 7 │ │ x ┆ 8 │ └─────┴─────┘
Join two LazyFrames:
>>> pl.sql( ... ''' ... SELECT lf1.*, d ... FROM lf1 ... INNER JOIN lf2 USING (a) ... WHERE a > 1 AND b < 8 ... ''' ... ).collect() shape: (1, 4) ┌─────┬─────┬─────┬──────┐ │ a ┆ b ┆ c ┆ d │ │ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ str ┆ i64 │ ╞═════╪═════╪═════╪══════╡ │ 2 ┆ 7 ┆ y ┆ -654 │ └─────┴─────┴─────┴──────┘
Apply SQL transforms and subsequently filter natively (you can freely mix SQL and native operations):
>>> pl.sql( ... query=''' ... SELECT ... a, ... (a % 2 == 0) AS a_is_even, ... (b::float4 / 2) AS "b/2", ... CONCAT_WS(':', c, c, c) AS c_c_c ... FROM lf1 ... ORDER BY a ... ''', ... ).filter(~pl.col("c_c_c").str.starts_with("x")).collect() shape: (2, 4) ┌─────┬───────────┬─────┬───────┐ │ a ┆ a_is_even ┆ b/2 ┆ c_c_c │ │ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ bool ┆ f32 ┆ str │ ╞═════╪═══════════╪═════╪═══════╡ │ 1 ┆ false ┆ 3.0 ┆ z:z:z │ │ 2 ┆ true ┆ 3.5 ┆ y:y:y │ └─────┴───────────┴─────┴───────┘
Join polars LazyFrame with a pandas DataFrame and a pyarrow Table:
>>> import pandas as pd >>> import pyarrow as pa >>> pl_frame = lf1 >>> pd_frame = pd.DataFrame({"a": [2, 3, 4], "d": [-0.5, 0.0, 0.5]}) >>> pa_table = pa.Table.from_arrays( ... [pa.array([1, 2, 3]), pa.array(["x", "y", "z"])], ... names=["a", "e"], ... ) >>> pl.sql( ... query=''' ... SELECT pl_frame.*, d, e ... FROM pl_frame ... JOIN pd_frame USING(a) ... JOIN pa_table USING(a) ... ''', ... ).collect() shape: (2, 5) ┌─────┬─────┬─────┬──────┬─────┐ │ a ┆ b ┆ c ┆ d ┆ e │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ str ┆ f64 ┆ str │ ╞═════╪═════╪═════╪══════╪═════╡ │ 2 ┆ 7 ┆ y ┆ -0.5 ┆ y │ │ 3 ┆ 8 ┆ x ┆ 0.0 ┆ z │ └─────┴─────┴─────┴──────┴─────┘