polars.DataFrame.to_numpy#
- DataFrame.to_numpy(
- *,
- order: IndexOrder = 'fortran',
- writable: bool = False,
- allow_copy: bool = True,
- structured: bool = False,
- use_pyarrow: bool | None = None,
Convert this DataFrame to a NumPy ndarray.
This operation copies data only when necessary. The conversion is zero copy when all of the following hold:
The DataFrame is fully contiguous in memory, with all Series back-to-back and all Series consisting of a single chunk.
The data type is an integer or float.
The DataFrame contains no null values.
The
order
parameter is set tofortran
(default).The
writable
parameter is set toFalse
(default).
- Parameters:
- order
The index order of the returned NumPy array, either C-like or Fortran-like. In general, using the Fortran-like index order is faster. However, the C-like order might be more appropriate to use for downstream applications to prevent cloning data, e.g. when reshaping into a one-dimensional array.
- writable
Ensure the resulting array is writable. This will force a copy of the data if the array was created without copy, as the underlying Arrow data is immutable.
- allow_copy
Allow memory to be copied to perform the conversion. If set to
False
, causes conversions that are not zero-copy to fail.- structured
Return a structured array with a data type that corresponds to the DataFrame schema. If set to
False
(default), a 2D ndarray is returned instead.- use_pyarrow
-
function for the conversion to NumPy if necessary.
Deprecated since version 0.20.28: Polars now uses its native engine by default for conversion to NumPy.
Examples
Numeric data without nulls can be converted without copying data in some cases. The resulting array will not be writable.
>>> df = pl.DataFrame({"a": [1, 2, 3]}) >>> arr = df.to_numpy() >>> arr array([[1], [2], [3]]) >>> arr.flags.writeable False
Set
writable=True
to force data copy to make the array writable.>>> df.to_numpy(writable=True).flags.writeable True
If the DataFrame contains different numeric data types, the resulting data type will be the supertype. This requires data to be copied. Integer types with nulls are cast to a float type with
nan
representing a null value.>>> df = pl.DataFrame({"a": [1, 2, None], "b": [4.0, 5.0, 6.0]}) >>> df.to_numpy() array([[ 1., 4.], [ 2., 5.], [nan, 6.]])
Set
allow_copy=False
to raise an error if data would be copied.>>> s.to_numpy(allow_copy=False) Traceback (most recent call last): ... RuntimeError: copy not allowed: cannot convert to a NumPy array without copying data
Polars defaults to F-contiguous order. Use
order="c"
to force the resulting array to be C-contiguous.>>> df.to_numpy(order="c").flags.c_contiguous True
DataFrames with mixed types will result in an array with an object dtype.
>>> df = pl.DataFrame( ... { ... "foo": [1, 2, 3], ... "bar": [6.5, 7.0, 8.5], ... "ham": ["a", "b", "c"], ... }, ... schema_overrides={"foo": pl.UInt8, "bar": pl.Float32}, ... ) >>> df.to_numpy() array([[1, 6.5, 'a'], [2, 7.0, 'b'], [3, 8.5, 'c']], dtype=object)
Set
structured=True
to convert to a structured array, which can better preserve individual column data such as name and data type.>>> df.to_numpy(structured=True) array([(1, 6.5, 'a'), (2, 7. , 'b'), (3, 8.5, 'c')], dtype=[('foo', 'u1'), ('bar', '<f4'), ('ham', '<U1')])