polars.testing.parametric.dataframes#

polars.testing.parametric.dataframes(

cols: int | column | Sequence[column] | None = None,

*,

lazy: bool = False,

min_cols: int = 1,

max_cols: int = 5,

min_size: int = 0,

max_size: int = 5,

include_cols: Sequence[column] | column | None = None,

allow_null: bool | Mapping[str, bool] = True,

allow_chunks: bool = True,

allow_masked_out: bool = True,

allowed_dtypes: Collection[PolarsDataType] | PolarsDataType | None = None,

excluded_dtypes: Collection[PolarsDataType] | PolarsDataType | None = None,

allow_time_zones: bool = True,

**kwargs: Any,

) → SearchStrategy[DataFrame | LazyFrame]#

Hypothesis strategy for producing Polars DataFrames or LazyFrames.

Warning

This functionality is currently considered unstable. It may be changed at any point without it being considered a breaking change.

Parameters:

cols{int, columns}, optional: integer number of columns to create, or a sequence of column objects that describe the desired DataFrame column data.
lazybool, optional: produce a LazyFrame instead of a DataFrame.
min_colsint, optional: if not passing an exact size, can set a minimum here (defaults to 0).
max_colsint, optional: if not passing an exact size, can set a maximum value here (defaults to MAX_COLS).
min_sizeint, optional: if not passing an exact size, set the minimum number of rows in the DataFrame.
max_sizeint, optional: if not passing an exact size, set the maximum number of rows in the DataFrame.
include_cols[column], optional: a list of column objects to include in the generated DataFrame. note that explicitly provided columns are appended onto the list of existing columns (if any present).
allow_nullbool or Mapping[str, bool]: Allow nulls as possible values and allow the Null data type by default. Accepts either a boolean or a mapping of column names to booleans.
allow_chunksbool: Allow the DataFrame to contain multiple chunks.
allow_masked_outbool: Allow the nulls to contain masked out elements.
allowed_dtypes{list,set}, optional: when automatically generating data, allow only these dtypes.
excluded_dtypes{list,set}, optional: when automatically generating data, exclude these dtypes.
allow_time_zones: Allow generating Datetime columns with a time zone.
**kwargs: Additional keyword arguments that are passed to the underlying data generation strategies.
sizeint, optional: if set, will create a DataFrame of exactly this size (and ignore the min_size/max_size len params).

Deprecated since version 1.0.0: Use min_size and max_size instead.
null_probability{float, dict[str,float]}, optional: percentage chance (expressed between 0.0 => 1.0) that a generated value is None. this is applied independently of any None values generated by the underlying strategy, and can be applied either on a per-column basis (if given as a {col:pct} dict), or globally. if null_probability is defined on a column, it takes precedence over the global value.

Deprecated since version 0.20.26: Use allow_null instead.
allow_infinitiesbool, optional: optionally disallow generation of +/-inf values for floating-point dtypes.

Deprecated since version 0.20.26: Use allow_infinity instead.

Notes

In actual usage this is deployed as a unit test decorator, providing a strategy that generates DataFrames or LazyFrames with the given characteristics for the unit test. While developing a strategy/test, it can also be useful to call .example() directly on a given strategy to see concrete instances of the generated data.

Examples

The strategy is generally used to generate series in a unit test:

>>> from polars.testing.parametric import dataframes
>>> from hypothesis import given
>>> @given(df=dataframes(min_size=3, max_size=5))
... def test_df_height(df: pl.DataFrame) -> None:
...     assert 3 <= df.height <= 5

Drawing examples interactively is also possible with the .example() method. This should be avoided while running tests.

>>> df = dataframes(allowed_dtypes=[pl.Datetime, pl.Float64], max_cols=3)
>>> df.example()  
shape: (3, 3)
┌─────────────┬────────────────────────────┬───────────┐
│ col0        ┆ col1                       ┆ col2      │
│ ---         ┆ ---                        ┆ ---       │
│ f64         ┆ datetime[ns]               ┆ f64       │
╞═════════════╪════════════════════════════╪═══════════╡
│ NaN         ┆ 1844-07-05 06:19:48.848808 ┆ 3.1436e16 │
│ -1.9914e218 ┆ 2068-12-01 23:05:11.412277 ┆ 2.7415e16 │
│ 0.5         ┆ 2095-11-19 22:05:17.647961 ┆ -0.5      │
└─────────────┴────────────────────────────┴───────────┘

Use column for more control over which exactly which columns are generated.

>>> from polars.testing.parametric import column
>>> dfs = dataframes(
...     [
...         column("x", dtype=pl.Int32),
...         column("y", dtype=pl.Float64),
...     ],
...     min_size=2,
...     max_size=2,
... )
>>> dfs.example()  
shape: (2, 2)
┌───────────┬────────────┐
│ x         ┆ y          │
│ ---       ┆ ---        │
│ i32       ┆ f64        │
╞═══════════╪════════════╡
│ -15836    ┆ 1.1755e-38 │
│ 575050513 ┆ NaN        │
└───────────┴────────────┘