polars.DataFrame.rows_by_key#
- DataFrame.rows_by_key(
- key: ColumnNameOrSelector | Sequence[ColumnNameOrSelector],
- *,
- named: bool = False,
- include_key: bool = False,
- unique: bool = False,
Returns all data as a dictionary of python-native values keyed by some column.
This method is like
rows
, but instead of returning rows in a flat list, rows are grouped by the values in thekey
column(s) and returned as a dictionary.Note that this method should not be used in place of native operations, due to the high cost of materializing all frame data out into a dictionary; it should be used only when you need to move the values out into a Python data structure or other object that cannot operate directly with Polars/Arrow.
- Parameters:
- key
The column(s) to use as the key for the returned dictionary. If multiple columns are specified, the key will be a tuple of those values, otherwise it will be a string.
- named
Return dictionary rows instead of tuples, mapping column name to row value.
- include_key
Include key values inline with the associated data (by default the key values are omitted as a memory/performance optimisation, as they can be reoconstructed from the key).
- unique
Indicate that the key is unique; this will result in a 1:1 mapping from key to a single associated row. Note that if the key is not actually unique the last row with the given key will be returned.
See also
Notes
If you have
ns
-precision temporal values you should be aware that Python natively only supports up toμs
-precision;ns
-precision values will be truncated to microseconds on conversion to Python. If this matters to your use-case you should export to a different format (such as Arrow or NumPy).Examples
>>> df = pl.DataFrame( ... { ... "w": ["a", "b", "b", "a"], ... "x": ["q", "q", "q", "k"], ... "y": [1.0, 2.5, 3.0, 4.5], ... "z": [9, 8, 7, 6], ... } ... )
Group rows by the given key column(s):
>>> df.rows_by_key(key=["w"]) defaultdict(<class 'list'>, {'a': [('q', 1.0, 9), ('k', 4.5, 6)], 'b': [('q', 2.5, 8), ('q', 3.0, 7)]})
Return the same row groupings as dictionaries:
>>> df.rows_by_key(key=["w"], named=True) defaultdict(<class 'list'>, {'a': [{'x': 'q', 'y': 1.0, 'z': 9}, {'x': 'k', 'y': 4.5, 'z': 6}], 'b': [{'x': 'q', 'y': 2.5, 'z': 8}, {'x': 'q', 'y': 3.0, 'z': 7}]})
Return row groupings, assuming keys are unique:
>>> df.rows_by_key(key=["z"], unique=True) {9: ('a', 'q', 1.0), 8: ('b', 'q', 2.5), 7: ('b', 'q', 3.0), 6: ('a', 'k', 4.5)}
Return row groupings as dictionaries, assuming keys are unique:
>>> df.rows_by_key(key=["z"], named=True, unique=True) {9: {'w': 'a', 'x': 'q', 'y': 1.0}, 8: {'w': 'b', 'x': 'q', 'y': 2.5}, 7: {'w': 'b', 'x': 'q', 'y': 3.0}, 6: {'w': 'a', 'x': 'k', 'y': 4.5}}
Return dictionary rows grouped by a compound key, including key values:
>>> df.rows_by_key(key=["w", "x"], named=True, include_key=True) defaultdict(<class 'list'>, {('a', 'q'): [{'w': 'a', 'x': 'q', 'y': 1.0, 'z': 9}], ('b', 'q'): [{'w': 'b', 'x': 'q', 'y': 2.5, 'z': 8}, {'w': 'b', 'x': 'q', 'y': 3.0, 'z': 7}], ('a', 'k'): [{'w': 'a', 'x': 'k', 'y': 4.5, 'z': 6}]})