Testing#

Asserts#

Polars provides some standard asserts for use with unit tests:

testing.assert_frame_equal(left, right, *[, ...])

Raise detailed AssertionError if left does NOT equal right.

testing.assert_series_equal(left, right, *)

Raise detailed AssertionError if left does NOT equal right.

Parametric testing#

See the Hypothesis library for more details about property-based testing, strategies, and library integrations:

Polars primitives#

Polars provides the following hypothesis testing primitives and strategy generators/helpers to make it easy to generate suitable test DataFrames and Series.

testing.parametric.dataframes()

Hypothesis strategy for producing polars DataFrames or LazyFrames.

testing.parametric.series(*[, name, dtype, ...])

Hypothesis strategy for producing polars Series.

Strategy helpers#

testing.parametric.column(name[, dtype, ...])

Define a column for use with the @dataframes strategy.

testing.parametric.columns([cols, dtype, ...])

Define multiple columns for use with the @dataframes strategy.

testing.parametric.create_list_strategy(...)

Hypothesis strategy for producing polars List data.

Profiles#

Several standard/named hypothesis profiles are provided:

  • fast: runs 100 iterations.

  • balanced: runs 1,000 iterations.

  • expensive: runs 10,000 iterations.

The load/set helper functions allow you to access these profiles directly, set your preferred profile (default is fast), or set a custom number of iterations.

testing.parametric.load_profile([profile, ...])

Load a named (or custom) hypothesis profile for use with the parametric tests.

testing.parametric.set_profile(profile)

Set the env var POLARS_HYPOTHESIS_PROFILE to the given profile name/value.

Approximate profile timings:

Running polars’ own parametric unit tests on 0.17.6 against release and debug builds, on a machine with 12 cores, using xdist -n auto results in the following timings (these values are indicative only, and may vary significantly depending on your own hardware setup):

Profile

Iterations

Release

Debug

fast

100

~6 secs

~8 secs

balanced

1,000

~22 secs

~30 secs

expensive

10,000

~3 mins 5 secs

~4 mins 45 secs

Examples#

Basic: Create a parametric unit test that will receive a series of generated DataFrames, each having 5 numeric columns with a 10% chance of any generated value being null (this is distinct from NaN).

from polars.testing.parametric import dataframes
from polars import NUMERIC_DTYPES
from hypothesis import given


@given(
    dataframes(
        cols=5,
        null_probabililty=0.1,
        allowed_dtypes=NUMERIC_DTYPES,
    )
)
def test_numeric(df):
    assert all(df[col].is_numeric() for col in df.columns)

    # Example frame:
    # ┌──────┬────────┬───────┬────────────┬────────────┐
    # │ col0 ┆ col1   ┆ col2  ┆ col3       ┆ col4       │
    # │ ---  ┆ ---    ┆ ---   ┆ ---        ┆ ---        │
    # │ u8   ┆ i16    ┆ u16   ┆ i32        ┆ f64        │
    # ╞══════╪════════╪═══════╪════════════╪════════════╡
    # │ 54   ┆ -29096 ┆ 485   ┆ 2147483647 ┆ -2.8257e14 │
    # │ null ┆ 7508   ┆ 37338 ┆ 7264       ┆ 1.5        │
    # │ 0    ┆ 321    ┆ null  ┆ 16996      ┆ NaN        │
    # │ 121  ┆ -361   ┆ 63204 ┆ 1          ┆ 1.1443e235 │
    # └──────┴────────┴───────┴────────────┴────────────┘

Intermediate: Integrate hypothesis-native strategies into specifically-named columns, generating a series of LazyFrames, with a minimum size of five rows and values that conform to the given strategies:

from polars.testing.parametric import column, dataframes
from hypothesis.strategies import floats, sampled_from, text
from hypothesis import given

from string import ascii_letters, digits

id_chars = ascii_letters + digits


@given(
    dataframes(
        cols=[
            column("id", strategy=text(min_size=4, max_size=4, alphabet=id_chars)),
            column("ccy", strategy=sampled_from(["GBP", "EUR", "JPY", "USD"])),
            column("price", strategy=floats(min_value=0.0, max_value=1000.0)),
        ],
        min_size=5,
        lazy=True,
    )
)
def test_price_calculations(lf):
    ...
    print(lf.collect())

    # Example frame:
    # ┌──────┬─────┬─────────┐
    # │ id   ┆ ccy ┆ price   │
    # │ ---  ┆ --- ┆ ---     │
    # │ str  ┆ str ┆ f64     │
    # ╞══════╪═════╪═════════╡
    # │ A101 ┆ GBP ┆ 1.1     │
    # │ 8nIn ┆ JPY ┆ 1.5     │
    # │ QHoO ┆ EUR ┆ 714.544 │
    # │ i0e0 ┆ GBP ┆ 0.0     │
    # │ 0000 ┆ USD ┆ 999.0   │
    # └──────┴─────┴─────────┘

Advanced: Create and use a List[UInt8] dtype strategy as a hypothesis composite that generates pairs of pairs of small integer values in which the first value in each nested pair is always less than or equal to the second value:

from polars.testing.parametric import create_list_strategy, dataframes, column
from hypothesis.strategies import composite
from hypothesis import given


@composite
def uint8_pairs(draw, uints=create_list_strategy(pl.UInt8, size=2)):
    pairs = list(zip(draw(uints), draw(uints)))
    return [sorted(ints) for ints in pairs]


@given(
    dataframes(
        cols=[
            column("colx", strategy=uint8_pairs()),
            column("coly", strategy=uint8_pairs()),
            column("colz", strategy=uint8_pairs()),
        ],
        size=3,
    )
)
def test_miscellaneous(df):
    ...

    # Example frame:
    # ┌─────────────────────────┬─────────────────────────┬──────────────────────────┐
    # │ colx                    ┆ coly                    ┆ colz                     │
    # │ ---                     ┆ ---                     ┆ ---                      │
    # │ list[list[i64]]         ┆ list[list[i64]]         ┆ list[list[i64]]          │
    # ╞═════════════════════════╪═════════════════════════╪══════════════════════════╡
    # │ [[143, 235], [75, 101]] ┆ [[143, 235], [75, 101]] ┆ [[31, 41], [57, 250]]    │
    # │ [[87, 186], [174, 179]] ┆ [[87, 186], [174, 179]] ┆ [[112, 213], [149, 221]] │
    # │ [[23, 85], [7, 86]]     ┆ [[23, 85], [7, 86]]     ┆ [[22, 255], [27, 28]]    │
    # └─────────────────────────┴─────────────────────────┴──────────────────────────┘