py-1.20.0
📦 polarsView on GitHub →
✨ 15 features🐛 35 fixes⚡ 1 deprecations🔧 22 symbols
Summary
This release focuses heavily on performance improvements across various areas, including streaming engine aggregations, serialization, and Rust/Python data conversion. It also introduces several new features like SQL support for NORMALIZE and enhancements to Parquet handling and cloud storage integration.
Migration Steps
- If you were relying on the positional argument for `str.to_decimal`, update your calls to use keyword arguments instead.
✨ New Features
- Added SQL support for the `NORMALIZE` string function.
- Added 'allow_exact_matches' option to 'join_asof'.
- Added new-streaming first/last aggregations.
- Added Parquet Sink to the new streaming engine.
- Made automatic use of Azure storage account keys opt-in.
- Improved `GroupsProxy/GroupsPosition` to be sliceable and cheaply cloneable.
- Added `str.normalize()` function.
- Allowed more group_by agg expressions in the new streaming engine.
- Support loading Excel Table objects by name.
- Support writing to file objects from `write_excel`.
- Added hint to error message for extra struct field in
- Added `index_of()` function to `Series` and `Expr`.
- Updated `sqlparser-rs`, enabling "LEFT" keyword to be optional for anti/semi joins in SQL queries.
- Added `cat.starts_with`/`cat.ends_with` functionality.
- Implemented CSV, IPC and NDJson in the `MultiScanExec` node.
🐛 Bug Fixes
- Avoided blocking on async runtime when resolving cloud scans.
- Fixed `allow_invalid_certificates` being ignored in `storage_options`.
- Fixed incorrect output type for `map_groups` returning all-NULL column.
- Fixed `unique(maintain_order=True)` raising `InvalidOperationError` for null array.
- Prevented collapsing into a Nested Loop Join if the cross join maintains order.
- Prevented serialization of credentials provider.
- Fixed `Series.n_unique` raising for list of struct.
- Fixed incorrect top-k by sorted column, fixed `head()` returning extra rows.
- Added outer validity to AnyValueBufferTrusted for structs.
- Prevented partitioning group-by with non-scalar literals in agg.
- Fixed xor operation of selector with Expr.
- Fixed incorrect view buffer dedup.
- Only verify Parquet ConvertedType if no LogicalType is given.
- Validated length of `schema_overrides` in `read_csv`.
- Fixed `map_elements` ignoring `skip_nulls=True` for struct dtype.
- Checked for MAP-GROUPS in cloud-eligible.
- Fixed empty output of `to_arrow()` on filtered unit height DataFrame.
- Added `.default` to azure credential provider scope URL.
- Fixed `join_asof` panicking for invalid `tolerance` input.
- Fixed incorrect flag check on is_elementwise.
- Prevented panic but set null type if type is unknown.
- Fixed performance regression for DataFrame serialization/pickling.
- Fixed `Int128` dtype serialization.
- Ensured `read_excel` and `read_ods` support reading from raw `bytes` for all engines.
- Ensured that SQL `LIKE` and `ILIKE` operators support multi-line matches.
- Properly broadcasted in sort_by.
- Properly loaded nested Parquet Statistics.
- Fixed AWS environment config not loading when credential provider was used.
- Fixed order observability of group-by-dyn.
- Ensured soundness when loading Parquet string statistics.
- Fixed error filtering after `with_columns()` on unit height LazyFrame.
- Propagated `tenant_id` to `CredentialProviderAzure` if given.
- Restored symbols on Apple by bumping nightly version.
- Fixed type annotation of `str.strip_chars_*` methods.
- Fixed variable name in error message for "unsupported data type" in rolling and upsampling operations.
🔧 Affected Symbols
str.to_decimalBitmapBuilderGrowablesSeriesTraitChunkedArrayDataFrameParquetverify_dict_indicesstorage_optionsmap_groupsuniquejoin_asofSeries.n_uniquemap_elementsread_csvInt128read_excelread_odssort_byCredentialProviderAzurestr.strip_chars_*LazyFrame.fill_null⚡ Deprecations
- The parameter of `str.to_decimal` is now keyword-only.