rs-0.51.0
Breaking Changes📦 polars
⚠ 1 breaking✨ 19 features🐛 75 fixes⚡ 1 deprecations🔧 17 symbols
Summary
This release introduces significant performance improvements across the streaming engine, new features like array support for unique counts and AWS role assumption guides, and addresses numerous bugs, notably around serialization and type handling. The most critical change is the transition of some eager Expressions to be lazy compatible.
⚠️ Breaking Changes
- Removed, deprecated, or changed eager Expressions (Exprs) to be lazy compatible. Users relying on eager evaluation of expressions that are now lazy may see changes in execution behavior or errors if they expected immediate results.
Migration Steps
- Review code that relied on eager evaluation of Expressions (Exprs) and ensure compatibility with the new lazy evaluation model, especially if immediate results were expected.
✨ New Features
- Added user guide section on AWS role assumption (#24421).
- Support for unique / n_unique / arg_unique for array columns (#24406).
- Support S3 virtual-hosted–style URI (#24405).
- Support Partitioning sinks in cloud (#24399).
- Add Polars security policy (#24314).
- Allow pl.Expr.log to take in an expression (#24226).
- Implement diff() in streaming engine (#24189).
- Enable Expr.diff(n) for negative n (#24200).
- Allow upcasting null-typed columns to nested column types in scans (#24185).
- Add cum_* as native streaming nodes (#23977).
- Add peak\_{min,max} support for booleans (#24068).
- Add DataFrame.map_columns for eager evaluation (#23821).
- Add native streaming for peaks\_{min,max} (#24039).
- Add DataTypeExpr.default_value (#23973).
- Add support for Int128 to pyo3-polars (#23959).
- Pass endpoint_url loaded from CredentialProviderAWS to scan/write_delta (#23812).
- Dispatch scan_iceberg to native by default (#23912).
- Implement dt.days_in_month function (#23119).
- Reinterpret binary data to fixed size numerical array (#22840).
🐛 Bug Fixes
- Fix AggState on all_literal in BinaryExpr (#24461).
- Replace unsafe with collect (#24494).
- Fix schema on ApplyExpr with single row literal in agg context (#24422).
- Fix planner schema for dividing pl.Float32 by int (#24432).
- Fix panic scanning from AWS legacy global endpoint URL (#24450).
- Emit proper tuple for Log in expression nodes (#24426).
- Do not propagate struct of nulls with null (#24420).
- Be stricter with invalid NDJSON input when ignore_errors=False (#24404).
- Implement approx_n_unique for temporal dtypes and Null (#24417).
- Correct sink_ipc overload for compression (#24398).
- Enable all integer dtypes for by parameter in join_asof (#24384).
- Fix Group-By + filter aggregation performs subsequent operations on all data instead of only filtered data (#24373).
- Fix incorrect output ordering for row-separable exprs (#24354).
- Fix Series.__arrow_c_stream__ for Decimal and other logical types (#24120).
- Match output type to engine for Struct arithmetic (#23805).
- Make mmap use MAP_PRIVATE rather than MAP_SHARED (#24343).
- Fix cloud iceberg scan DATASET_PROVIDER_VTABLE error (#24338).
- Incorrect logic in negative streaming slice (#24326).
- Do not error on non-list Sequence for columns parameter in read_excel (#23967).
- Invalid conversion from non-bit numpy bools (#24312).
- Make dt.epoch('s') serializable (#24302).
- Make Expr.rechunk serializable (#24303).
- Schema mismatch for 'log' operation (#24300).
- Incorrect first/last aggregate in streaming engine (#24289).
- Fix group offsets in sliced groups (#24274).
- Panic in inexact date(time) conversion (#24268).
- The index_of feature should not depends on the object feature (#24256).
- Keep DSL cache after serialization and deserialization (#24265).
- Sanitize and warn about eval usage (#24262).
- Unique with keep="none" in new optimization pass (#24261).
- Correct size limits for Decimal cast (#24252).
- Unordered unions in check order observing pass (#24253).
- Fix dtype for slice on Literal in agg context (#24137).
- Fix incorrect filter(lit(True)) when scanning hive (#24237).
- In-memory group_by on 128-bit integers (#24242).
- Fix panic in gather inside groupby with invalid indices (#24182).
- Release the GIL in map_groups (#24225).
- Remove extra explode in LazyGroupBy.{head,tail} (#24221).
- Fix panic in polars cloud CSV scan (#24197).
- Fix panic when loading categorical columns from IO plugin (#24205).
- Fix engine type for concat_list on AggScalar implode (#24160).
- Rolling_mean handle centered weights with len(values) < window_size (#24158).
- Reading is_in predicate for Parquet plain strings (#24184).
- Make PyCategories pickleable (#24170).
- Remove unused unsound function to_mutable_slice (#24173).
- PyO3 extension types giving compat_level errors (#24166).
- Allow non-elementwise by in top_k (#24164).
- Fix sort_by for group_by_dynamic context (#24152).
- Input-independent length aggregations in streaming (#24153).
- Release GIL when iterating df in to_arrow (#24151).
- Respect non-elementwise join_where conditions (#24135).
- Resolve schema mismatch for div on Boolean (#24111).
- Keep name when doing empty group-aware aggregation (#24098).
- Implode instead of reshape_list (#24078).
- Rolling mean with weights incorrect when min_samples < window_size (#23485).
- Allow merge_sorted for all types (#24077).
- Include datatypes in row_encode expression (#24074).
- Include UDF materialized type in serialization (#24073).
- Correct .rolling() output type for non-aggregations (#24072).
- Correct planner output schema for join_asof (#24071).
- Allow %B to work without specifying day (#24009).
- Correct output for fold and reduce (#24069).
- Ensure upcast operations on pl.Date default to microsecond precision (#23981).
- Planner output type for mean with strange input type (#24052).
- Scan of multiple sources with null datatype (#24065).
- Categorical in nested data in row encoding (#24051).
- Missing length update in builder for pl.Array repetition (#24055).
- Race condition in global categories init (#24045).
- Revert "fix: Don't encode entire CategoricalMapping when going to Arrow (#24036)" (#24044).
- Error when using named functions (#24041).
- Don't encode entire CategoricalMapping when going to Arrow (#24036).
- Fix cast on arithmetic with lit (#23941).
- Incorrect slice-slice pushdown (#24032).
- Dedup common cache subplan in IR graph (#24028).
- Allow join on Decimal in in-memory engine (#24026).
🔧 Affected Symbols
pl.Exprpl.Series.shiftpl.DataFrame.map_columnspl.Expr.logpl.Expr.diffpl.Series.__arrow_c_stream__pl.dt.epochpl.Expr.rechunkpl.read_excelpl.LazyGroupBypl.concat_listpl.rolling_meanpl.merge_sortedpl.row_encodepl.Datepl.meanpl.cast⚡ Deprecations
- Added a deprecation warning for pl.Series.shift(Null) (#24114).