Change8

py-1.23.0

📦 polarsView on GitHub →
17 features🐛 36 fixes🔧 21 symbols

Summary

This release focuses heavily on performance improvements, especially around rolling operations and group-by scenarios, alongside numerous bug fixes across data types, I/O, and SQL functionality. New features include SQL DELETE support and enhanced streaming capabilities.

Migration Steps

  1. Rename the `credentials` parameter to `credential` in `CredentialProviderAzure` if you are using it.
  2. If you were relying on specific behavior for invalid inputs to `pl.datetime` or `pl.date`, note that they now produce nulls instead of potentially unexpected values.

✨ New Features

  • Toggled projection pushdown for eager rolling.
  • Added sampling to new-streaming equi join to decide between build/probe side.
  • Implemented i128 -> str cast.
  • Connected polars-cloud.
  • Introduced Version DSL.
  • Made user facing binary formats mostly self describing.
  • Filtered hive files using predicates in new streaming.
  • Added negative slicing to new streaming multiscan.
  • Allowed iterable of frames as input to `align_frames`.
  • Implemented sorted flags for struct series.
  • Supported reading arrow Map type from Delta.
  • Added a dedicated `remove` method for `DataFrame` and `LazyFrame`.
  • Implemented `merge_sorted` for struct.
  • Added positive slice for new streaming MultiScan.
  • Added SQL support for the `DELETE` statement.
  • Added row index to new streaming multiscan.
  • Improved DataFrame fmt in explain.

🐛 Bug Fixes

  • Method `dt.ordinal_day` now returns results based on the local timestamp instead of UTC.
  • Used Kahan summation for rolling sum kernels, fixing numerical stability issues.
  • Added scalar checks for `n` and `fill_value` parameters in `shift`.
  • Upcast small integer dtypes for rolling sum operations.
  • Prevented silent production of null values from invalid input to `pl.datetime` and `pl.date`.
  • Allowed duration multiplied w/ primitive to propagate in IR schema.
  • Fixed struct arithmetic broadcasting behavior.
  • Fixed pathologic `rolling + group-by` performance and memory explosion.
  • Fixed panic when projecting only row index from IPC file.
  • Properly updated groups after `gather` in aggregation context.
  • Fixed unequal DataFrame column heights from parquet hive scan with filter.
  • Fixed ColumnNotFound error selecting `len()` after semi/anti join.
  • Merged Parquet nested and flat decoders.
  • Method `dt.offset_by` no longer discards month and year info if day was included in offset for timezone-aware columns.
  • Fixed pickling of `polars.col` on Python versions <3.11.
  • Fixed duplicate column names after join if suffix was already present.
  • Fixed performance regression for eager `join_where`.
  • Fixed incorrect predicate pushdown for predicates referring to right-join key columns.
  • Fixed panic in `to_physical` for series of arrays and lists.
  • Resolved deadlock due to leaking in Connector recv drop.
  • Fixed incorrect result for merge\_sorted with lexical categorical.
  • Added `Int128` path for `join_asof`.
  • Categorical min/max now returns String dtype rather than Categorical.
  • Fixed checking overflow in Sliced function.
  • Fixed adding a struct field using a literal which raised InvalidOperationError.
  • Return nulls for `is_finite`, `is_infinite`, and `is_nan` when dtype is `pl.Null`.
  • Accounted for minor change in new `connectorx` release.
  • Fixed infinite recursion when broadcasting into struct zip\_outer\_validity.
  • Resolved deadlock due to bad logic in new-streaming join sampling.
  • Fixed incorrect result for top\_k/bottom\_k when input is sorted.
  • Fixed UTF-8 validation of nested string slice in Parquet.
  • Raised error instead of panicking when casting a Series to a Struct with the wrong number of fields.
  • Fixed panic in `strptime()` if `format` ends with '%'.
  • Raised error instead of panicking for unsupported SQL operations.
  • Fixed projection of only row index in new streaming IPC.
  • Fixed projection count query optimization.

🔧 Affected Symbols

dt.ordinal_dayrolling sum kernelsshiftrolling sum operationspl.datetimepl.datealign_framesstruct seriesDeltaDataFrameLazyFrameCredentialProviderAzurenew streaming multiscanDELETE statementdt.offset_bypolars.coljoinjoin_whereto_physicalstrptime()connectorx