polars.DataFrame.to_torch#
- DataFrame.to_torch(
- return_type: TorchExportType = 'tensor',
- *,
- label: str | Expr | Sequence[str | Expr] | None = None,
- features: str | Expr | Sequence[str | Expr] | None = None,
- dtype: PolarsDataType | None = None,
Convert DataFrame to a 2D PyTorch tensor, Dataset, or dict of Tensors.
New in version 0.20.23.
- Parameters:
- return_type{“tensor”, “dataset”, “dict”}
Set return type; a 2D PyTorch tensor, PolarsDataset (a frame-specialized TensorDataset), or dict of Tensors.
- label
One or more column names, expressions, or selectors that label the feature data; when
return_type
is “dataset”, the PolarsDataset will return(features, label)
tensor tuples for each row. Otherwise, it returns(features,)
tensor tuples where the feature contains all the row data; note that setting this parameter with any other result type will raise an informative error.- features
One or more column names, expressions, or selectors that contain the feature data; if omitted, all columns that are not designated as part of the label are used. This parameter is a no-op for return-types other than “dataset”.
- dtype
Unify the dtype of all returned tensors; this casts any frame Series that are not of the required dtype before converting to tensor. This includes the label column unless the label is an expression (such as
pl.col("label_column").cast(pl.Int16)
).
Examples
>>> df = pl.DataFrame( ... { ... "lbl": [0, 1, 2, 3], ... "feat1": [1, 0, 0, 1], ... "feat2": [1.5, -0.5, 0.0, -2.25], ... } ... )
Standard return type (Tensor), with f32 supertype:
>>> df.to_torch(dtype=pl.Float32) tensor([[ 0.0000, 1.0000, 1.5000], [ 1.0000, 0.0000, -0.5000], [ 2.0000, 0.0000, 0.0000], [ 3.0000, 1.0000, -2.2500]])
As a dictionary of individual Tensors:
>>> df.to_torch("dict") {'lbl': tensor([0, 1, 2, 3]), 'feat1': tensor([1, 0, 0, 1]), 'feat2': tensor([ 1.5000, -0.5000, 0.0000, -2.2500], dtype=torch.float64)}
As a PolarsDataset, with f64 supertype:
>>> ds = df.to_torch("dataset", dtype=pl.Float64) >>> ds[3] (tensor([ 3.0000, 1.0000, -2.2500], dtype=torch.float64),) >>> ds[:2] (tensor([[ 0.0000, 1.0000, 1.5000], [ 1.0000, 0.0000, -0.5000]], dtype=torch.float64),) >>> ds[[0, 3]] (tensor([[ 0.0000, 1.0000, 1.5000], [ 3.0000, 1.0000, -2.2500]], dtype=torch.float64),)
As a convenience the PolarsDataset can opt-in to half-precision data for experimentation (usually this would be set on the model/pipeline):
>>> list(ds.half()) [(tensor([0.0000, 1.0000, 1.5000], dtype=torch.float16),), (tensor([ 1.0000, 0.0000, -0.5000], dtype=torch.float16),), (tensor([2., 0., 0.], dtype=torch.float16),), (tensor([ 3.0000, 1.0000, -2.2500], dtype=torch.float16),)]
Pass PolarsDataset to a DataLoader, designating the label:
>>> from torch.utils.data import DataLoader >>> ds = df.to_torch("dataset", label="lbl") >>> dl = DataLoader(ds, batch_size=2) >>> batches = list(dl) >>> batches[0] [tensor([[ 1.0000, 1.5000], [ 0.0000, -0.5000]], dtype=torch.float64), tensor([0, 1])]
Note that labels can be given as expressions, allowing them to have a dtype independent of the feature columns (multi-column labels are supported).
>>> ds = df.to_torch( ... "dataset", ... dtype=pl.Float32, ... label=pl.col("lbl").cast(pl.Int16), ... ) >>> ds[:2] (tensor([[ 1.0000, 1.5000], [ 0.0000, -0.5000]]), tensor([0, 1], dtype=torch.int16))
Easily integrate with (for example) scikit-learn and other datasets:
>>> from sklearn.datasets import fetch_california_housing >>> housing = fetch_california_housing() >>> df = pl.DataFrame( ... data=housing.data, ... schema=housing.feature_names, ... ).with_columns( ... Target=housing.target, ... ) >>> train = df.to_torch("dataset", label="Target") >>> loader = DataLoader( ... train, ... shuffle=True, ... batch_size=64, ... )