Testing#
Asserts#
Polars provides some standard asserts for use with unit tests:
|
Raise detailed AssertionError if left does NOT equal right. |
|
Raise detailed AssertionError if left does NOT equal right. |
Parametric testing#
See the Hypothesis library for more details about property-based testing, strategies, and library integrations:
Polars primitives#
Polars provides the following hypothesis testing primitives and strategy generators/helpers to make it easy to generate suitable test DataFrames and Series.
Hypothesis strategy for producing polars DataFrames or LazyFrames. |
|
|
Hypothesis strategy for producing polars Series. |
Strategy helpers#
|
Define a column for use with the @dataframes strategy. |
|
Define multiple columns for use with the @dataframes strategy. |
Hypothesis strategy for producing polars List data. |
Profiles#
Several standard/named hypothesis profiles are provided:
fast
: runs 100 iterations.balanced
: runs 1,000 iterations.expensive
: runs 10,000 iterations.
The load/set helper functions allow you to access these profiles directly,
set your preferred profile (default is fast
), or set a custom number
of iterations.
|
Load a named (or custom) hypothesis profile for use with the parametric tests. |
|
Set the env var |
Approximate profile timings:
Running polars’ own parametric unit tests on 0.17.6
against release
and debug builds, on a machine with 12 cores, using xdist -n auto
results in the following timings (these values are indicative only,
and may vary significantly depending on your own hardware setup):
Profile |
Iterations |
Release |
Debug |
---|---|---|---|
|
100 |
~6 secs |
~8 secs |
|
1,000 |
~22 secs |
~30 secs |
|
10,000 |
~3 mins 5 secs |
~4 mins 45 secs |
Examples#
Basic: Create a parametric unit test that will receive a series of
generated DataFrames, each having 5 numeric columns with a 10% chance
of any generated value being null
(this is distinct from NaN
).
from polars.testing.parametric import dataframes
from polars import NUMERIC_DTYPES
from hypothesis import given
@given(
dataframes(
cols=5,
null_probabililty=0.1,
allowed_dtypes=NUMERIC_DTYPES,
)
)
def test_numeric(df):
assert all(df[col].is_numeric() for col in df.columns)
# Example frame:
# ┌──────┬────────┬───────┬────────────┬────────────┐
# │ col0 ┆ col1 ┆ col2 ┆ col3 ┆ col4 │
# │ --- ┆ --- ┆ --- ┆ --- ┆ --- │
# │ u8 ┆ i16 ┆ u16 ┆ i32 ┆ f64 │
# ╞══════╪════════╪═══════╪════════════╪════════════╡
# │ 54 ┆ -29096 ┆ 485 ┆ 2147483647 ┆ -2.8257e14 │
# │ null ┆ 7508 ┆ 37338 ┆ 7264 ┆ 1.5 │
# │ 0 ┆ 321 ┆ null ┆ 16996 ┆ NaN │
# │ 121 ┆ -361 ┆ 63204 ┆ 1 ┆ 1.1443e235 │
# └──────┴────────┴───────┴────────────┴────────────┘
Intermediate: Integrate hypothesis-native strategies into specifically-named columns, generating a series of LazyFrames, with a minimum size of five rows and values that conform to the given strategies:
from polars.testing.parametric import column, dataframes
from hypothesis.strategies import floats, sampled_from, text
from hypothesis import given
from string import ascii_letters, digits
id_chars = ascii_letters + digits
@given(
dataframes(
cols=[
column("id", strategy=text(min_size=4, max_size=4, alphabet=id_chars)),
column("ccy", strategy=sampled_from(["GBP", "EUR", "JPY", "USD"])),
column("price", strategy=floats(min_value=0.0, max_value=1000.0)),
],
min_size=5,
lazy=True,
)
)
def test_price_calculations(lf):
...
print(lf.collect())
# Example frame:
# ┌──────┬─────┬─────────┐
# │ id ┆ ccy ┆ price │
# │ --- ┆ --- ┆ --- │
# │ str ┆ str ┆ f64 │
# ╞══════╪═════╪═════════╡
# │ A101 ┆ GBP ┆ 1.1 │
# │ 8nIn ┆ JPY ┆ 1.5 │
# │ QHoO ┆ EUR ┆ 714.544 │
# │ i0e0 ┆ GBP ┆ 0.0 │
# │ 0000 ┆ USD ┆ 999.0 │
# └──────┴─────┴─────────┘
Advanced: Create and use a List[UInt8]
dtype strategy as a hypothesis
composite
that generates pairs of pairs of small integer values in which the first value in each nested pair
is always less than or equal to the second value:
from polars.testing.parametric import create_list_strategy, dataframes, column
from hypothesis.strategies import composite
from hypothesis import given
@composite
def uint8_pairs(draw, uints=create_list_strategy(pl.UInt8, size=2)):
pairs = list(zip(draw(uints), draw(uints)))
return [sorted(ints) for ints in pairs]
@given(
dataframes(
cols=[
column("colx", strategy=uint8_pairs()),
column("coly", strategy=uint8_pairs()),
column("colz", strategy=uint8_pairs()),
],
size=3,
)
)
def test_miscellaneous(df):
...
# Example frame:
# ┌─────────────────────────┬─────────────────────────┬──────────────────────────┐
# │ colx ┆ coly ┆ colz │
# │ --- ┆ --- ┆ --- │
# │ list[list[i64]] ┆ list[list[i64]] ┆ list[list[i64]] │
# ╞═════════════════════════╪═════════════════════════╪══════════════════════════╡
# │ [[143, 235], [75, 101]] ┆ [[143, 235], [75, 101]] ┆ [[31, 41], [57, 250]] │
# │ [[87, 186], [174, 179]] ┆ [[87, 186], [174, 179]] ┆ [[112, 213], [149, 221]] │
# │ [[23, 85], [7, 86]] ┆ [[23, 85], [7, 86]] ┆ [[22, 255], [27, 28]] │
# └─────────────────────────┴─────────────────────────┴──────────────────────────┘