polars.testing.parametric.dataframes#
- polars.testing.parametric.dataframes(
- cols: int | column | Sequence[column] | None = None,
- *,
- lazy: Literal[False] = False,
- min_cols: int | None = 0,
- max_cols: int | None = MAX_COLS,
- size: int | None = None,
- min_size: int | None = 0,
- max_size: int | None = MAX_DATA_SIZE,
- chunked: bool | None = None,
- include_cols: Sequence[column] | column | None = None,
- null_probability: float | dict[str, float] = 0.0,
- allow_infinities: bool = True,
- allowed_dtypes: Collection[PolarsDataType] | PolarsDataType | None = None,
- excluded_dtypes: Collection[PolarsDataType] | PolarsDataType | None = None,
- polars.testing.parametric.dataframes(
- cols: int | column | Sequence[column] | None = None,
- *,
- lazy: Literal[True],
- min_cols: int | None = 0,
- max_cols: int | None = MAX_COLS,
- size: int | None = None,
- min_size: int | None = 0,
- max_size: int | None = MAX_DATA_SIZE,
- chunked: bool | None = None,
- include_cols: Sequence[column] | column | None = None,
- null_probability: float | dict[str, float] = 0.0,
- allow_infinities: bool = True,
- allowed_dtypes: Collection[PolarsDataType] | PolarsDataType | None = None,
- excluded_dtypes: Collection[PolarsDataType] | PolarsDataType | None = None,
Hypothesis strategy for producing polars DataFrames or LazyFrames.
- Parameters:
- cols{int, columns}, optional
integer number of columns to create, or a sequence of
column
objects that describe the desired DataFrame column data.- lazybool, optional
produce a LazyFrame instead of a DataFrame.
- min_colsint, optional
if not passing an exact size, can set a minimum here (defaults to 0).
- max_colsint, optional
if not passing an exact size, can set a maximum value here (defaults to MAX_COLS).
- sizeint, optional
if set, will create a DataFrame of exactly this size (and ignore the min_size/max_size len params).
- min_sizeint, optional
if not passing an exact size, set the minimum number of rows in the DataFrame.
- max_sizeint, optional
if not passing an exact size, set the maximum number of rows in the DataFrame.
- chunkedbool, optional
ensure that DataFrames with more than row have
n_chunks
> 1. if omitted, chunking will be randomised at the level of individual Series.- include_cols[column], optional
a list of
column
objects to include in the generated DataFrame. note that explicitly provided columns are appended onto the list of existing columns (if any present).- null_probability{float, dict[str,float]}, optional
percentage chance (expressed between 0.0 => 1.0) that a generated value is None. this is applied independently of any None values generated by the underlying strategy, and can be applied either on a per-column basis (if given as a
{col:pct}
dict), or globally. if null_probability is defined on a column, it takes precedence over the global value.- allow_infinitiesbool, optional
optionally disallow generation of +/-inf values for floating-point dtypes.
- allowed_dtypes{list,set}, optional
when automatically generating data, allow only these dtypes.
- excluded_dtypes{list,set}, optional
when automatically generating data, exclude these dtypes.
Notes
In actual usage this is deployed as a unit test decorator, providing a strategy that generates DataFrames or LazyFrames with the given characteristics for the unit test. While developing a strategy/test, it can also be useful to call
.example()
directly on a given strategy to see concrete instances of the generated data.Examples
Use
column
orcolumns
to specify the schema of the types of DataFrame to generate. Note: in actual use the strategy is applied as a test decorator, not used standalone.>>> from polars.testing.parametric import column, columns, dataframes >>> from hypothesis import given
Generate arbitrary DataFrames (as part of a unit test):
>>> @given(df=dataframes()) ... def test_repr(df: pl.DataFrame) -> None: ... assert isinstance(repr(df), str)
Generate LazyFrames with at least 1 column, random dtypes, and specific size:
>>> dfs = dataframes(min_cols=1, max_size=5, lazy=True) >>> dfs.example() <polars.LazyFrame object at 0x11F561580>
Generate DataFrames with known colnames, random dtypes (per test, not per-frame):
>>> dfs = dataframes(columns(["x", "y", "z"])) >>> dfs.example() shape: (3, 3) ┌────────────┬───────┬────────────────────────────┐ │ x ┆ y ┆ z │ │ --- ┆ --- ┆ --- │ │ date ┆ u16 ┆ datetime[μs] │ ╞════════════╪═══════╪════════════════════════════╡ │ 0565-08-12 ┆ 34715 ┆ 5844-09-20 00:33:31.076854 │ │ 3382-10-17 ┆ 48662 ┆ 7540-01-29 11:20:14.836271 │ │ 4063-06-17 ┆ 39092 ┆ 1889-05-05 13:25:41.874455 │ └────────────┴───────┴────────────────────────────┘
Generate frames with explicitly named/typed columns and a fixed size:
>>> dfs = dataframes( ... [ ... column("x", dtype=pl.Int32), ... column("y", dtype=pl.Float64), ... ], ... size=2, ... ) >>> dfs.example() shape: (2, 2) ┌───────────┬────────────┐ │ x ┆ y │ │ --- ┆ --- │ │ i32 ┆ f64 │ ╞═══════════╪════════════╡ │ -15836 ┆ 1.1755e-38 │ │ 575050513 ┆ NaN │ └───────────┴────────────┘