Version 0.19
Breaking changes
Aggregation functions no longer support horizontal computation
This impacts aggregation functions like sum
, min
, and max
. These functions were overloaded to
support both vertical and horizontal computation. Recently, new dedicated functionality for
horizontal computation was released, and horizontal computation was deprecated.
Restore the old behavior by using the horizontal variant, e.g. sum_horizontal
.
Example
Before:
>>> df = pl.DataFrame({'a': [1, 2], 'b': [11, 12]})
>>> df.select(pl.sum('a', 'b')) # horizontal computation
shape: (2, 1)
┌─────┐
│ sum │
│ --- │
│ i64 │
╞═════╡
│ 12 │
│ 14 │
└─────┘
After:
>>> df = pl.DataFrame({'a': [1, 2], 'b': [11, 12]})
>>> df.select(pl.sum('a', 'b')) # vertical computation
shape: (1, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 3 ┆ 23 │
└─────┴─────┘
Update to all
/ any
all
will now ignore null values by default, rather than treat them as False
.
For both any
and all
, the drop_nulls
parameter has been renamed to ignore_nulls
and is now
keyword-only. Also fixed an issue when setting this parameter to False
would erroneously result in
None
output in some cases.
To restore the old behavior, set ignore_nulls
to False
and check for None
output.
Example
Before:
>>> pl.Series([True, None]).all()
False
After:
>>> pl.Series([True, None]).all()
True
Improved error types for many methods
Improving our error messages is an ongoing effort. We did a sweep of our Python code base and made
many improvements to error messages and error types. Most notably, many ValueError
s were changed
to TypeError
s.
If your code relies on handling Polars exceptions, you may have to make some adjustments.
Example
Before:
>>> pl.Series(values=15)
...
ValueError: Series constructor called with unsupported type; got 'int'
After:
>>> pl.Series(values=15)
...
TypeError: Series constructor called with unsupported type 'int' for the `values` parameter
Updates to expression input parsing
Methods like select
and with_columns
accept one or more expressions. But they also accept
strings, integers, lists, and other inputs that we try to interpret as expressions. We updated our
internal logic to parse inputs more consistently.
Example
Before:
>>> pl.DataFrame({'a': [1, 2]}).with_columns(None)
shape: (2, 1)
┌─────┐
│ a │
│ --- │
│ i64 │
╞═════╡
│ 1 │
│ 2 │
└─────┘
After:
>>> pl.DataFrame({'a': [1, 2]}).with_columns(None)
shape: (2, 2)
┌─────┬─────────┐
│ a ┆ literal │
│ --- ┆ --- │
│ i64 ┆ null │
╞═════╪═════════╡
│ 1 ┆ null │
│ 2 ┆ null │
└─────┴─────────┘
shuffle
/ sample
now use an internal Polars seed
If you used the built-in Python random.seed
function to control the randomness of Polars
expressions, this will no longer work. Instead, use the new set_random_seed
function.
Example
Before:
import random
random.seed(1)
After:
import polars as pl
pl.set_random_seed(1)
Deprecations
Creating a consistent and intuitive API is hard; finding the right name for each function, method,
and parameter might be the hardest part. The new version comes with several naming changes, and you
will most likely run into deprecation warnings when upgrading to 0.19
.
If you want to upgrade without worrying about deprecation warnings right now, you can add the following snippet to your code:
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)
groupby
renamed to group_by
This is not a change we make lightly, as it will impact almost all our users. But "group by" is really two different words, and our naming strategy dictates that these should be separated by an underscore.
Most likely, a simple search and replace will be enough to take care of this update:
- Search:
.groupby(
- Replace:
.group_by(
apply
renamed to map_*
apply
is probably the most misused part of our API. Many Polars users come from pandas, where
apply
has a completely different meaning.
We now consolidate all our functionality for user-defined functions under the name map
. This
results in the following renaming:
Before | After |
---|---|
Series/Expr.apply |
map_elements |
Series/Expr.rolling_apply |
rolling_map |
DataFrame.apply |
map_rows |
GroupBy.apply |
map_groups |
apply |
map_groups |
map |
map_batches |