py-1.23.0
📦 polarsView on GitHub →
✨ 17 features🐛 36 fixes🔧 21 symbols
Summary
This release focuses heavily on performance improvements, especially around rolling operations and group-by scenarios, alongside numerous bug fixes across data types, I/O, and SQL functionality. New features include SQL DELETE support and enhanced streaming capabilities.
Migration Steps
- Rename the `credentials` parameter to `credential` in `CredentialProviderAzure` if you are using it.
- If you were relying on specific behavior for invalid inputs to `pl.datetime` or `pl.date`, note that they now produce nulls instead of potentially unexpected values.
✨ New Features
- Toggled projection pushdown for eager rolling.
- Added sampling to new-streaming equi join to decide between build/probe side.
- Implemented i128 -> str cast.
- Connected polars-cloud.
- Introduced Version DSL.
- Made user facing binary formats mostly self describing.
- Filtered hive files using predicates in new streaming.
- Added negative slicing to new streaming multiscan.
- Allowed iterable of frames as input to `align_frames`.
- Implemented sorted flags for struct series.
- Supported reading arrow Map type from Delta.
- Added a dedicated `remove` method for `DataFrame` and `LazyFrame`.
- Implemented `merge_sorted` for struct.
- Added positive slice for new streaming MultiScan.
- Added SQL support for the `DELETE` statement.
- Added row index to new streaming multiscan.
- Improved DataFrame fmt in explain.
🐛 Bug Fixes
- Method `dt.ordinal_day` now returns results based on the local timestamp instead of UTC.
- Used Kahan summation for rolling sum kernels, fixing numerical stability issues.
- Added scalar checks for `n` and `fill_value` parameters in `shift`.
- Upcast small integer dtypes for rolling sum operations.
- Prevented silent production of null values from invalid input to `pl.datetime` and `pl.date`.
- Allowed duration multiplied w/ primitive to propagate in IR schema.
- Fixed struct arithmetic broadcasting behavior.
- Fixed pathologic `rolling + group-by` performance and memory explosion.
- Fixed panic when projecting only row index from IPC file.
- Properly updated groups after `gather` in aggregation context.
- Fixed unequal DataFrame column heights from parquet hive scan with filter.
- Fixed ColumnNotFound error selecting `len()` after semi/anti join.
- Merged Parquet nested and flat decoders.
- Method `dt.offset_by` no longer discards month and year info if day was included in offset for timezone-aware columns.
- Fixed pickling of `polars.col` on Python versions <3.11.
- Fixed duplicate column names after join if suffix was already present.
- Fixed performance regression for eager `join_where`.
- Fixed incorrect predicate pushdown for predicates referring to right-join key columns.
- Fixed panic in `to_physical` for series of arrays and lists.
- Resolved deadlock due to leaking in Connector recv drop.
- Fixed incorrect result for merge\_sorted with lexical categorical.
- Added `Int128` path for `join_asof`.
- Categorical min/max now returns String dtype rather than Categorical.
- Fixed checking overflow in Sliced function.
- Fixed adding a struct field using a literal which raised InvalidOperationError.
- Return nulls for `is_finite`, `is_infinite`, and `is_nan` when dtype is `pl.Null`.
- Accounted for minor change in new `connectorx` release.
- Fixed infinite recursion when broadcasting into struct zip\_outer\_validity.
- Resolved deadlock due to bad logic in new-streaming join sampling.
- Fixed incorrect result for top\_k/bottom\_k when input is sorted.
- Fixed UTF-8 validation of nested string slice in Parquet.
- Raised error instead of panicking when casting a Series to a Struct with the wrong number of fields.
- Fixed panic in `strptime()` if `format` ends with '%'.
- Raised error instead of panicking for unsupported SQL operations.
- Fixed projection of only row index in new streaming IPC.
- Fixed projection count query optimization.
🔧 Affected Symbols
dt.ordinal_dayrolling sum kernelsshiftrolling sum operationspl.datetimepl.datealign_framesstruct seriesDeltaDataFrameLazyFrameCredentialProviderAzurenew streaming multiscanDELETE statementdt.offset_bypolars.coljoinjoin_whereto_physicalstrptime()connectorx