AWS Lambda
Orchestrate Polars Cloud queries using AWS-native serverless infrastructure through EventBridge and Lambda. This section details how to implement scheduled query execution without infrastructure management by submitting workloads to Polars Cloud via Lambda functions while leveraging AWS Secrets Manager for secure credential handling.
Tip
Submitting a query does not require for the process submitting it to remain alive if the Polars Cloud compute context is *not* built as a regular Python
context manager.
Lambda function
The first hurdle is providing an environment including Polars
dependencies to the Lambda function;
this can be done in various ways, all documented by AWS
here. The most commonly used
approach is via
creating a zip
package including dependencies.
The code for the lambda function can be boiled down to the following (pseudo-code):
import boto3
import polars as pl
import polars_cloud as pc
client = boto3.client("secretsmanager")
# authenticate to polars cloud with the secrets created above
pc.authenticate(
client_id=client.get_secret_value(SecretId="<SECRET>").get("SecretString"),
client_secret=client.get_secret_value(SecretId="<SECRET>").get("SecretString"),
)
# define the compute context
cc = pc.ComputeContext(cpus=2, memory=4)
# submit the query
pl.scan_csv(...).remote(cc).sink_parquet(...)
Once the query is submitted the Lambda will gracefully exit, leaving the rest of the handling to Polars Cloud.
Triggering rule
Since we are here not using any dedicated orchestrator infrastructure (like Airflow for instance) we can instead generate triggering rules in AWS EventBridge. Rules can be defined via the AWS Console (point-and-click) or via the AWS CLI, as documented here. A simple CRON rule should be enough to trigger your query to run at given interval.