polars.StringCache#

class polars.StringCache[source]#

Context manager that allows data sources to share the same categorical features.

This will temporarily cache the string categories until the context manager is finished. If StringCaches are nested, the global cache will only be invalidated when the outermost context exits.

Examples

>>> with pl.StringCache():
...     df1 = pl.DataFrame(
...         data={
...             "color": ["red", "green", "blue", "orange"],
...             "value": [1, 2, 3, 4],
...         },
...         schema={"color": pl.Categorical, "value": pl.UInt8},
...     )
...     df2 = pl.DataFrame(
...         data={
...             "color": ["yellow", "green", "orange", "black", "red"],
...             "char": ["a", "b", "c", "d", "e"],
...         },
...         schema={"color": pl.Categorical, "char": pl.Utf8},
...     )
...
...     # Both dataframes use the same string cache for the categorical column,
...     # so the join operation on that column will succeed.
...     df_join = df1.join(df2, how="inner", on="color")
...
>>> df_join
shape: (3, 3)
┌────────┬───────┬──────┐
│ color  ┆ value ┆ char │
│ ---    ┆ ---   ┆ ---  │
│ cat    ┆ u8    ┆ str  │
╞════════╪═══════╪══════╡
│ green  ┆ 2     ┆ b    │
│ orange ┆ 4     ┆ c    │
│ red    ┆ 1     ┆ e    │
└────────┴───────┴──────┘
__init__(*args, **kwargs)#

Methods

__init__(*args, **kwargs)