polars.DataFrame.rows_by_key#

DataFrame.rows_by_key( key: ColumnNameOrSelector | Sequence[ColumnNameOrSelector], *, named: bool = False, include_key: bool = False, unique: bool = False, ) → dict[Any, Iterable[Any]][source]#

Returns all data as a dictionary of python-native values keyed by some column.

This method is like rows, but instead of returning rows in a flat list, rows are grouped by the values in the key column(s) and returned as a dictionary.

Note that this method should not be used in place of native operations, due to the high cost of materializing all frame data out into a dictionary; it should be used only when you need to move the values out into a Python data structure or other object that cannot operate directly with Polars/Arrow.

Parameters:

key: The column(s) to use as the key for the returned dictionary. If multiple columns are specified, the key will be a tuple of those values, otherwise it will be a string.
named: Return dictionary rows instead of tuples, mapping column name to row value.
include_key: Include key values inline with the associated data (by default the key values are omitted as a memory/performance optimisation, as they can be reoconstructed from the key).
unique: Indicate that the key is unique; this will result in a 1:1 mapping from key to a single associated row. Note that if the key is not actually unique the last row with the given key will be returned.

See also

rows: Materialize all frame data as a list of rows (potentially expensive).
iter_rows: Row iterator over frame data (does not materialize all rows).

Notes

If you have ns-precision temporal values you should be aware that Python natively only supports up to μs-precision; ns-precision values will be truncated to microseconds on conversion to Python. If this matters to your use-case you should export to a different format (such as Arrow or NumPy).

Examples

>>> df = pl.DataFrame(
...     {
...         "w": ["a", "b", "b", "a"],
...         "x": ["q", "q", "q", "k"],
...         "y": [1.0, 2.5, 3.0, 4.5],
...         "z": [9, 8, 7, 6],
...     }
... )

Group rows by the given key column(s):

>>> df.rows_by_key(key=["w"])
defaultdict(<class 'list'>,
    {'a': [('q', 1.0, 9), ('k', 4.5, 6)],
     'b': [('q', 2.5, 8), ('q', 3.0, 7)]})

Return the same row groupings as dictionaries:

>>> df.rows_by_key(key=["w"], named=True)
defaultdict(<class 'list'>,
    {'a': [{'x': 'q', 'y': 1.0, 'z': 9},
           {'x': 'k', 'y': 4.5, 'z': 6}],
     'b': [{'x': 'q', 'y': 2.5, 'z': 8},
           {'x': 'q', 'y': 3.0, 'z': 7}]})

Return row groupings, assuming keys are unique:

>>> df.rows_by_key(key=["z"], unique=True)
{9: ('a', 'q', 1.0),
 8: ('b', 'q', 2.5),
 7: ('b', 'q', 3.0),
 6: ('a', 'k', 4.5)}

Return row groupings as dictionaries, assuming keys are unique:

>>> df.rows_by_key(key=["z"], named=True, unique=True)
{9: {'w': 'a', 'x': 'q', 'y': 1.0},
 8: {'w': 'b', 'x': 'q', 'y': 2.5},
 7: {'w': 'b', 'x': 'q', 'y': 3.0},
 6: {'w': 'a', 'x': 'k', 'y': 4.5}}

Return dictionary rows grouped by a compound key, including key values:

>>> df.rows_by_key(key=["w", "x"], named=True, include_key=True)
defaultdict(<class 'list'>,
    {('a', 'q'): [{'w': 'a', 'x': 'q', 'y': 1.0, 'z': 9}],
     ('b', 'q'): [{'w': 'b', 'x': 'q', 'y': 2.5, 'z': 8},
                  {'w': 'b', 'x': 'q', 'y': 3.0, 'z': 7}],
     ('a', 'k'): [{'w': 'a', 'x': 'k', 'y': 4.5, 'z': 6}]})