polars.LazyFrame.remote#
- LazyFrame.remote(
- context: pc.ComputeContext | None = None,
- *,
- plan_type: pc._typing.PlanTypePreference = 'dot',
- n_retries: int = 0,
- engine: pc._typing.Engine = 'auto',
- scaling_mode: pc._typing.ScalingMode = 'auto',
Run a query remotely on Polars Cloud.
This allows you to run Polars remotely on one or more workers via several strategies for distributed compute.
Read more in the Announcement post
- Parameters:
- context
Compute context in which queries are executed. If none given, it will take the default context.
- plan_type: {‘plain’, ‘dot’}
Whether to give a dot diagram of a plain text version of logical plan.
- n_retries:
How often a stage should be retried on failure.
- engine: {‘auto’, ‘streaming’, ‘in-memory’}
This will serve as a hint that tells Polars which engine to prefer. It doesn’t have to be respected.
- scaling_mode: {‘auto’, ‘single-node’, ‘distributed’}
If set to auto, a query that doesn’t explicitly specify a scaling mode via
remote().distributed()orremote().single_node()will run in distributed mode if the cluster has more than 1 node.
Examples
Run a query on a cloud instance.
>>> lf = pl.LazyFrame([1, 2, 3]).sum() >>> in_progress = lf.remote().collect() >>> # do some other work >>> in_progress.await_result() shape: (1, 1) ┌──────────┐ │ column_0 │ │ --- │ │ i64 │ ╞══════════╡ │ 6 │ └──────────┘
Explicitly run a query distributed.
>>> lf = ( ... pl.scan_parquet("s3://my_bucket/").group_by("key").agg(pl.sum("values")) ... ) >>> in_progress = lf.remote().distributed().collect() >>> in_progress.await_result() shape: (1, 1) ┌──────────┐ │ column_0 │ │ --- │ │ i64 │ ╞══════════╡ │ 6 │ └──────────┘