polars.json_normalize#

polars.json_normalize(
data: dict[Any, Any] | Sequence[dict[Any, Any] | Any],
*,
separator: str = '.',
max_level: int | None = None,
schema: Schema | None = None,
strict: bool = True,
infer_schema_length: int | None = 100,
encoder: JSONEncoder | None = None,
) DataFrame[source]#

Normalize semi-structured deserialized JSON data into a flat table.

Dictionary objects that will not be unnested/normalized are encoded as json string data. Unlike it pandas’ counterpart, this function will not encode dictionaries as objects at any level.

Warning

This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.

Parameters:
data

Deserialized JSON objects.

separator

Nested records will generate names separated by sep. e.g., for separator=".", {"foo": {"bar": 0}} -> foo.bar.

max_level

Max number of levels(depth of dict) to normalize. If None, normalizes all levels.

schema

Overwrite the Schema when the normalized data is passed to the DataFrame constructor.

strict

Whether Polars should be strict when constructing the DataFrame.

infer_schema_length

Number of rows to take into consideration to determine the schema.

encoder

Custom JSON encoder function; if not given, json.dumps is used.

Examples

>>> data = [
...     {
...         "id": 1,
...         "name": "Cole Volk",
...         "fitness": {"height": 180, "weight": 85},
...     },
...     {
...         "id": 2,
...         "name": "Faye Raker",
...         "fitness": {"height": 155, "weight": 58},
...     },
...     {
...         "name": "Mark Reg",
...         "fitness": {"height": 170, "weight": 78},
...     },
... ]
>>> pl.json_normalize(data, max_level=1)
shape: (3, 4)
┌──────┬────────────┬────────────────┬────────────────┐
│ id   ┆ name       ┆ fitness.height ┆ fitness.weight │
│ ---  ┆ ---        ┆ ---            ┆ ---            │
│ i64  ┆ str        ┆ i64            ┆ i64            │
╞══════╪════════════╪════════════════╪════════════════╡
│ 1    ┆ Cole Volk  ┆ 180            ┆ 85             │
│ 2    ┆ Faye Raker ┆ 155            ┆ 58             │
│ null ┆ Mark Reg   ┆ 170            ┆ 78             │
└──────┴────────────┴────────────────┴────────────────┘

Normalize to a specific depth, using a custom JSON encoder (note that orson.dumps encodes to bytes, not str).

>>> import orjson
>>> pl.json_normalize(data, max_level=0, encoder=orjson.dumps)
shape: (3, 3)
┌──────┬────────────┬───────────────────────────────┐
│ id   ┆ name       ┆ fitness                       │
│ ---  ┆ ---        ┆ ---                           │
│ i64  ┆ str        ┆ binary                        │
╞══════╪════════════╪═══════════════════════════════╡
│ 1    ┆ Cole Volk  ┆ b"{"height":180,"weight":85}" │
│ 2    ┆ Faye Raker ┆ b"{"height":155,"weight":58}" │
│ null ┆ Mark Reg   ┆ b"{"height":170,"weight":78}" │
└──────┴────────────┴───────────────────────────────┘