polars.json_normalize#

Normalize semi-structured deserialized JSON data into a flat table.

Dictionary objects that will not be unnested/normalized are encoded as json string data. Unlike it pandas’ counterpart, this function will not encode dictionaries as objects at any level.

Warning

This functionality is considered unstable. It may be changed at any point without it being considered a breaking change.

Parameters:

data: Deserialized JSON objects.
separator: Nested records will generate names separated by sep. e.g., for separator=".", {"foo": {"bar": 0}} -> foo.bar.
max_level: Max number of levels(depth of dict) to normalize. If None, normalizes all levels.
schema: Overwrite the Schema when the normalized data is passed to the DataFrame constructor.
strict: Whether Polars should be strict when constructing the DataFrame.
infer_schema_length: Number of rows to take into consideration to determine the schema.
encoder: Custom JSON encoder function; if not given, json.dumps is used.

Examples

>>> data = [
...     {
...         "id": 1,
...         "name": "Cole Volk",
...         "fitness": {"height": 180, "weight": 85},
...     },
...     {
...         "id": 2,
...         "name": "Faye Raker",
...         "fitness": {"height": 155, "weight": 58},
...     },
...     {
...         "name": "Mark Reg",
...         "fitness": {"height": 170, "weight": 78},
...     },
... ]
>>> pl.json_normalize(data, max_level=1)
shape: (3, 4)
┌──────┬────────────┬────────────────┬────────────────┐
│ id   ┆ name       ┆ fitness.height ┆ fitness.weight │
│ ---  ┆ ---        ┆ ---            ┆ ---            │
│ i64  ┆ str        ┆ i64            ┆ i64            │
╞══════╪════════════╪════════════════╪════════════════╡
│ 1    ┆ Cole Volk  ┆ 180            ┆ 85             │
│ 2    ┆ Faye Raker ┆ 155            ┆ 58             │
│ null ┆ Mark Reg   ┆ 170            ┆ 78             │
└──────┴────────────┴────────────────┴────────────────┘

Normalize to a specific depth, using a custom JSON encoder (note that orson.dumps encodes to bytes, not str).

>>> import orjson
>>> pl.json_normalize(data, max_level=0, encoder=orjson.dumps)
shape: (3, 3)
┌──────┬────────────┬───────────────────────────────┐
│ id   ┆ name       ┆ fitness                       │
│ ---  ┆ ---        ┆ ---                           │
│ i64  ┆ str        ┆ binary                        │
╞══════╪════════════╪═══════════════════════════════╡
│ 1    ┆ Cole Volk  ┆ b"{"height":180,"weight":85}" │
│ 2    ┆ Faye Raker ┆ b"{"height":155,"weight":58}" │
│ null ┆ Mark Reg   ┆ b"{"height":170,"weight":78}" │
└──────┴────────────┴───────────────────────────────┘