polars.LazyFrame.remote#

LazyFrame.remote(
context: pc.ComputeContext | None = None,
*,
plan_type: pc._typing.PlanTypePreference = 'dot',
n_retries: int = 0,
engine: pc._typing.Engine = 'auto',
scaling_mode: pc._typing.ScalingMode = 'auto',
) pc.LazyFrameRemote[source]#

Run a query remotely on Polars Cloud.

This allows you to run Polars remotely on one or more workers via several strategies for distributed compute.

Read more in the Announcement post

Parameters:
context

Compute context in which queries are executed. If none given, it will take the default context.

plan_type: {‘plain’, ‘dot’}

Whether to give a dot diagram of a plain text version of logical plan.

n_retries:

How often a stage should be retried on failure.

engine: {‘auto’, ‘streaming’, ‘in-memory’}

This will serve as a hint that tells Polars which engine to prefer. It doesn’t have to be respected.

scaling_mode: {‘auto’, ‘single-node’, ‘distributed’}

If set to auto, a query that doesn’t explicitly specify a scaling mode via remote().distributed() or remote().single_node() will run in distributed mode if the cluster has more than 1 node.

Examples

Run a query on a cloud instance.

>>> lf = pl.LazyFrame([1, 2, 3]).sum()
>>> in_progress = lf.remote().collect()  
>>> # do some other work
>>> in_progress.await_result()  
shape: (1, 1)
┌──────────┐
│ column_0 │
│ ---      │
│ i64      │
╞══════════╡
│ 6        │
└──────────┘

Explicitly run a query distributed.

>>> lf = (
...     pl.scan_parquet("s3://my_bucket/").group_by("key").agg(pl.sum("values"))
... )
>>> in_progress = lf.remote().distributed().collect()  
>>> in_progress.await_result()  
shape: (1, 1)
┌──────────┐
│ column_0 │
│ ---      │
│ i64      │
╞══════════╡
│ 6        │
└──────────┘