SQL Clauses#

Function

Description

SELECT

Retrieves specific column data from one or more tables.

FROM

Specify the table(s) from which to retrieve or delete data.

JOIN

Combine rows from two or more tables based on a related column.

WHERE

Filter rows returned from the query based on specific condition(s).

GROUP BY

Aggregate row values based based on one or more key columns.

HAVING

Filter groups in a GROUP BY based on specific condition(s).

ORDER BY

Sort the query result based on one or more specified columns.

LIMIT

Specify the number of rows returned.

OFFSET

Skip a specified number of rows.

SELECT#

Select the columns to be returned by the query.

Example:

df = pl.DataFrame(
  {
    "a": [1, 2, 3],
    "b": ["zz", "yy", "xx"],
  }
)
df.sql("""
  SELECT a, b FROM self
""")
# shape: (3, 2)
# ┌─────┬─────┐
# │ a   ┆ b   │
# │ --- ┆ --- │
# │ i64 ┆ str │
# ╞═════╪═════╡
# │ 1   ┆ zz  │
# │ 2   ┆ yy  │
# │ 3   ┆ xx  │
# └─────┴─────┘

FROM#

Specifies the table(s) from which to retrieve or delete data.

Example:

df = pl.DataFrame(
  {
    "a": [1, 2, 3],
    "b": ["zz", "yy", "xx"],
  }
)
df.sql("""
  SELECT * FROM self
""")
# shape: (3, 2)
# ┌─────┬─────┐
# │ a   ┆ b   │
# │ --- ┆ --- │
# │ i64 ┆ str │
# ╞═════╪═════╡
# │ 1   ┆ zz  │
# │ 2   ┆ yy  │
# │ 3   ┆ xx  │
# └─────┴─────┘

JOIN#

Combines rows from two or more tables based on a related column.

Join Types

  • CROSS JOIN

  • FULL JOIN

  • INNER JOIN

  • LEFT JOIN

  • [LEFT] ANTI JOIN

  • [LEFT] SEMI JOIN

  • RIGHT ANTI JOIN

  • RIGHT SEMI JOIN

Example:

df1 = pl.DataFrame(
  {
    "foo": [1, 2, 3],
    "ham": ["a", "b", "c"],
  }
)
df2 = pl.DataFrame(
  {
    "apple": ["x", "y", "z"],
    "ham": ["a", "b", "d"],
  }
)
pl.sql("""
  SELECT foo, apple, COALESCE(df1.ham, df2.ham) AS ham
  FROM df1 FULL JOIN df2
  USING (ham)
""").collect()

# shape: (4, 3)
# ┌──────┬───────┬─────┐
# │ foo  ┆ apple ┆ ham │
# │ ---  ┆ ---   ┆ --- │
# │ i64  ┆ str   ┆ str │
# ╞══════╪═══════╪═════╡
# │ 1    ┆ x     ┆ a   │
# │ 2    ┆ y     ┆ b   │
# │ null ┆ z     ┆ d   │
# │ 3    ┆ null  ┆ c   │
# └──────┴───────┴─────┘

WHERE#

Filter rows returned from the query based on specific condition(s).

df = pl.DataFrame(
  {
    "foo": [30, 40, 50],
    "ham": ["a", "b", "c"],
  }
)
df.sql("""
  SELECT * FROM self WHERE foo > 42
""")
# shape: (1, 2)
# ┌─────┬─────┐
# │ foo ┆ ham │
# │ --- ┆ --- │
# │ i64 ┆ str │
# ╞═════╪═════╡
# │ 50  ┆ c   │
# └─────┴─────┘

GROUP BY#

Group rows that have the same values in specified columns into summary rows.

Example:

df = pl.DataFrame(
    {
      "foo": ["a", "b", "b"],
      "bar": [10, 20, 30],
    }
  )
df.sql("""
  SELECT foo, SUM(bar) FROM self GROUP BY foo
""")
# shape: (2, 2)
# ┌─────┬─────┐
# │ foo ┆ bar │
# │ --- ┆ --- │
# │ str ┆ i64 │
# ╞═════╪═════╡
# │ b   ┆ 50  │
# │ a   ┆ 10  │
# └─────┴─────┘

HAVING#

Filter groups in a GROUP BY based on specific condition(s).

df = pl.DataFrame(
      {
      "foo": ["a", "b", "b", "c"],
      "bar": [10, 20, 30, 40],
    }
  )
df.sql("""
  SELECT foo, SUM(bar) FROM self GROUP BY foo HAVING bar >= 40
""")
# shape: (2, 2)
# ┌─────┬─────┐
# │ foo ┆ bar │
# │ --- ┆ --- │
# │ str ┆ i64 │
# ╞═════╪═════╡
# │ c   ┆ 40  │
# │ b   ┆ 50  │
# └─────┴─────┘

ORDER BY#

Sort the query result based on one or more specified columns.

Example:

df = pl.DataFrame(
  {
    "foo": ["b", "a", "c", "b"],
    "bar": [20, 10, 40, 30],
  }
)
df.sql("""
  SELECT foo, bar FROM self ORDER BY bar DESC
""")
# shape: (4, 2)
# ┌─────┬─────┐
# │ foo ┆ bar │
# │ --- ┆ --- │
# │ str ┆ i64 │
# ╞═════╪═════╡
# │ c   ┆ 40  │
# │ b   ┆ 30  │
# │ b   ┆ 20  │
# │ a   ┆ 10  │
# └─────┴─────┘

LIMIT#

Limit the number of rows returned by the query.

Example:

df = pl.DataFrame(
  {
    "foo": ["b", "a", "c", "b"],
    "bar": [20, 10, 40, 30],
  }
)
df.sql("""
  SELECT foo, bar FROM self LIMIT 2
""")
# shape: (2, 2)
# ┌─────┬─────┐
# │ foo ┆ bar │
# │ --- ┆ --- │
# │ str ┆ i64 │
# ╞═════╪═════╡
# │ b   ┆ 20  │
# │ a   ┆ 10  │
# └─────┴─────┘

OFFSET#

Skip a number of rows before starting to return rows from the query.

Example:

df = pl.DataFrame(
  {
    "foo": ["b", "a", "c", "b"],
    "bar": [20, 10, 40, 30],
  }
)
df.sql("""
  SELECT foo, bar FROM self LIMIT 2 OFFSET 2
""")
# shape: (2, 2)
# ┌─────┬─────┐
# │ foo ┆ bar │
# │ --- ┆ --- │
# │ str ┆ i64 │
# ╞═════╪═════╡
# │ c   ┆ 40  │
# │ b   ┆ 30  │
# └─────┴─────┘