rs-0.50.0
📦 polars
✨ 24 features🐛 79 fixes🔧 53 symbols
Summary
This release focuses heavily on performance improvements by lowering more operations to the streaming engine and optimizing various internal processes. It also introduces significant enhancements around Categorical/Enum types and fixes numerous bugs across expressions, I/O, and joins.
Migration Steps
- Raise and Warn on UDF's without `return_dtype` set. Ensure UDFs specify `return_dtype` if necessary.
✨ New Features
- Make `Selector` a concrete part of the DSL.
- Rework Categorical/Enum to use (Frozen)Categories.
- Expand on `DataTypeExpr`.
- Add scalar checks to range expressions.
- Expose `POLARS_DOT_SVG_VIEWER` to automatically dispatch to SVG viewer.
- Implement mean function in `arr` namespace.
- Implement `vec_hash` for `List` and `Array`.
- Add unstable `pl.row_index()` expression.
- Add Categories on the Python side.
- Implement partitioned sinks for the in-memory engine.
- Expose `IRFunctionExpr::Rank` in the python visitor.
- Expose `IRFunctionExpr::FillNullWithStrategy` in the python visitor.
- Allow cast to Categorical inside list.eval.
- Support `pathlib.Path` as source for `read/scan_delta()`.
- Enable default set of `ScanCastOptions` for native `scan_iceberg()` (also noted as a fix).
- Pass payload in `ExprRegistry`.
- Support reading nanosecond/Int96 timestamps and schema evolved datasets in `scan_delta()`.
- Support row group skipping with filters when `cast_options` is given.
- Use `scan_parquet().collect_schema()` for `read_parquet_schema`.
- Add dtype to str.to\_integer().
- Add `arr.slice`, `arr.head` and `arr.tail` methods to `arr` namespace.
- Add `is_close` method.
- Added `drop_nulls` option to `to_dummies`.
- Support comma as decimal separator for CSV write.
🐛 Bug Fixes
- Fix credential refresh logic.
- Fix `to_datetime()` fallible identification.
- Correct output datatype for `dt.with_time_unit`.
- Fix incorrect native Iceberg scan from tables with renamed/dropped columns/fields.
- Allow DataType expressions with selectors.
- Match output type to engine for `interpolate` on `Decimal`.
- Remaining bugs in `with_exprs_and_input` and pruning.
- Match output dtype to engine for `cum_sum_horizontal`.
- Fix field names for `pl.struct` in group-by.
- Fix output for `str.extract_groups` with empty string pattern.
- Match output type to engine for `rolling_map`.
- Fix incorrect join on single Int128 column for in-memory engine.
- Match output field name to lhs for `BusinessDaycount`.
- Correct the planner output datatype for `strptime`.
- Sort and Scan `with_exprs_and_input`.
- Revert to old behavior with `name.keep`.
- Fix panic loading from arrow `Map` containing timestamps.
- Selectors in `self` part of `list.eval`.
- Fix output field dtype for `ToInteger`.
- Allow `decimal_comma` with `,` separator in `read_csv`.
- Fix handling of UTF-8 in `write_csv` to `IO[str]`.
- Selectors in `{Lazy,Data}Frame.filter`.
- Stop splitfields iterator at eol in simd branch.
- Correct output datatype of dt.year and dt.mil.
- Logic of broadcast\_rhs in binary functions to correct list.set\_intersection for list[str] columns.
- Order-preserving equi-join didn't always flush final matches.
- Fix ColumnNotFound error when joining on `col().cast()`.
- Fix agg groups on `when/then` in `group_by` context.
- Output type for sign.
- Apply `agg_fn` on `null` values in `pivot`.
- Remove nonsensical duration variance.
- Don't panic when sinking nested categorical to Parquet.
- Correctly set value count output field name.
- Casting unused columns in to\_torch.
- Allow inferring of hours-only timezone offset.
- Bug in Categorical <-> str compare with nulls.
- Honor `n=0` in all cases of `str.replace`.
- Remove arbitrary 25 item limit from implicit Python list -> Series infer.
- Relabel duplicate sequence IDs in distributor.
- Round-trip Enum and Categorical metadata in plugins.
- Fix incorrect `join_asof` with `by` followed by `head/slice`.
- Allow writing nested Int128 data to Parquet.
- Enum serialization assert.
- Output type for peak\_min / peak\_max.
- Make Scalar Categorical, Enum and Struct values serializable.
- Preserve row order within partition when sinking parquet.
- Panic in `create_multiple_physical_plans` when branching from a single cache node.
- Prevent in-mem partition sink deadlock.
- Correctly handle null values when comparing structs.
- Make fold/reduce/cum\_reduce/cum\_fold serializable.
- Make `Expr.append` serializable.
- Float by float division dtype.
- Division on empty DataFrame generating null row.
- Partition sink `copy_exprs` and `with_exprs_and_input`.
- Unreachable with `pl.self_dtype`.
- Rolling median incorrect min\_samples with nulls.
- Make `Int128` roundtrippable via Parquet.
- Fix panic when common subplans contain IEJoins.
- Properly handle non-finite floats in rolling\_sum/mean.
- Make `read_csv_batched` respect `skip_rows` and `skip_lines`.
- Always use `cloudpickle` for the python objects in cloud plans.
- Support string literals in index\_of() on categoricals.
- Don't panic for `finish_callback` with nested datatypes.
- Support min/max aggregation for DataFrame/LazyFrame Categoricals.
- Fix var/moment dtypes.
- Fix agg\_groups dtype.
- Clear cached\_schema when apply changes dtype.
- Allow structured conversion to/from numpy with Array types, preserving shape.
- Null handling in full-null group\_by\_dynamic mean/sum.
- Fix index calculation for `nearest` interpolation.
- Fix compilation failure with `--no-default-features` and `--features lazy,strings`.
- Parse parquet footer length into unsigned integer.
- Fix incorrect results with `group_by` aggregation on empty groups.
- Fix boolean `min()` in `group_by` aggregation (streaming).
- Respect data-model in `map_elements`.
- Properly join URI paths in `PlPath`.
- Ignore null values in `bitwise` aggregation on bools.
- Fix panic filtering after left join.
- Out-of-bounds inde[x access in `str.replace`].
🔧 Affected Symbols
SelectorCategoricalEnumExpr.sliceany()all()ColumnTransformDataTypeExprpl.row_index()IRFunctionExpr::RankIRFunctionExpr::FillNullWithStrategylist.evalread/scan_delta()scan_iceberg()ExprRegistryscan_parquet()read_parquet_schemastr.to_integer()arr.slicearr.headarr.tailis_closeto_dummiesread_csvwrite_csvdt.with_time_unitinterpolatecum_sum_horizontalpl.structstr.extract_groupsrolling_mapBusinessDaycountstrptimelist.set_intersectioncol().cast()when/thenpivotto_torchstr.replacejoin_asofcreate_multiple_physical_plansfold/reduce/cum_reduce/cum_foldExpr.appendread_csv_batchedcloudpickleindex_of()finish_callbackvar/momentagg_groupsgroup_by_dynamicgroup_bymap_elementsPlPath