polars.testing.parametric.dataframes#
- polars.testing.parametric.dataframes(
- cols: int | column | Sequence[column] | None = None,
- *,
- lazy: bool = False,
- min_cols: int = 1,
- max_cols: int = 5,
- min_size: int = 0,
- max_size: int = 5,
- include_cols: Sequence[column] | column | None = None,
- allow_null: bool | Mapping[str, bool] = True,
- allow_chunks: bool = True,
- allow_masked_out: bool = True,
- allowed_dtypes: Collection[PolarsDataType] | PolarsDataType | None = None,
- excluded_dtypes: Collection[PolarsDataType] | PolarsDataType | None = None,
- allow_time_zones: bool = True,
- **kwargs: Any,
Hypothesis strategy for producing Polars DataFrames or LazyFrames.
Warning
This functionality is currently considered unstable. It may be changed at any point without it being considered a breaking change.
- Parameters:
- cols{int, columns}, optional
integer number of columns to create, or a sequence of
column
objects that describe the desired DataFrame column data.- lazybool, optional
produce a LazyFrame instead of a DataFrame.
- min_colsint, optional
if not passing an exact size, can set a minimum here (defaults to 0).
- max_colsint, optional
if not passing an exact size, can set a maximum value here (defaults to MAX_COLS).
- min_sizeint, optional
if not passing an exact size, set the minimum number of rows in the DataFrame.
- max_sizeint, optional
if not passing an exact size, set the maximum number of rows in the DataFrame.
- include_cols[column], optional
a list of
column
objects to include in the generated DataFrame. note that explicitly provided columns are appended onto the list of existing columns (if any present).- allow_nullbool or Mapping[str, bool]
Allow nulls as possible values and allow the
Null
data type by default. Accepts either a boolean or a mapping of column names to booleans.- allow_chunksbool
Allow the DataFrame to contain multiple chunks.
- allow_masked_outbool
Allow the nulls to contain masked out elements.
- allowed_dtypes{list,set}, optional
when automatically generating data, allow only these dtypes.
- excluded_dtypes{list,set}, optional
when automatically generating data, exclude these dtypes.
- allow_time_zones
Allow generating
Datetime
columns with a time zone.- **kwargs
Additional keyword arguments that are passed to the underlying data generation strategies.
- sizeint, optional
if set, will create a DataFrame of exactly this size (and ignore the min_size/max_size len params).
Deprecated since version 1.0.0: Use
min_size
andmax_size
instead.- null_probability{float, dict[str,float]}, optional
percentage chance (expressed between 0.0 => 1.0) that a generated value is None. this is applied independently of any None values generated by the underlying strategy, and can be applied either on a per-column basis (if given as a
{col:pct}
dict), or globally. if null_probability is defined on a column, it takes precedence over the global value.Deprecated since version 0.20.26: Use
allow_null
instead.- allow_infinitiesbool, optional
optionally disallow generation of +/-inf values for floating-point dtypes.
Deprecated since version 0.20.26: Use
allow_infinity
instead.
Notes
In actual usage this is deployed as a unit test decorator, providing a strategy that generates DataFrames or LazyFrames with the given characteristics for the unit test. While developing a strategy/test, it can also be useful to call
.example()
directly on a given strategy to see concrete instances of the generated data.Examples
The strategy is generally used to generate series in a unit test:
>>> from polars.testing.parametric import dataframes >>> from hypothesis import given >>> @given(df=dataframes(min_size=3, max_size=5)) ... def test_df_height(df: pl.DataFrame) -> None: ... assert 3 <= df.height <= 5
Drawing examples interactively is also possible with the
.example()
method. This should be avoided while running tests.>>> df = dataframes(allowed_dtypes=[pl.Datetime, pl.Float64], max_cols=3) >>> df.example() shape: (3, 3) ┌─────────────┬────────────────────────────┬───────────┐ │ col0 ┆ col1 ┆ col2 │ │ --- ┆ --- ┆ --- │ │ f64 ┆ datetime[ns] ┆ f64 │ ╞═════════════╪════════════════════════════╪═══════════╡ │ NaN ┆ 1844-07-05 06:19:48.848808 ┆ 3.1436e16 │ │ -1.9914e218 ┆ 2068-12-01 23:05:11.412277 ┆ 2.7415e16 │ │ 0.5 ┆ 2095-11-19 22:05:17.647961 ┆ -0.5 │ └─────────────┴────────────────────────────┴───────────┘
Use
column
for more control over which exactly which columns are generated.>>> from polars.testing.parametric import column >>> dfs = dataframes( ... [ ... column("x", dtype=pl.Int32), ... column("y", dtype=pl.Float64), ... ], ... min_size=2, ... max_size=2, ... ) >>> dfs.example() shape: (2, 2) ┌───────────┬────────────┐ │ x ┆ y │ │ --- ┆ --- │ │ i32 ┆ f64 │ ╞═══════════╪════════════╡ │ -15836 ┆ 1.1755e-38 │ │ 575050513 ┆ NaN │ └───────────┴────────────┘