py-1.20.0

📅 Jan 16, 2025📦 polarsView on GitHub →

✨ 15 features🐛 35 fixes⚡ 1 deprecations🔧 22 symbols

Summary

This release focuses heavily on performance improvements across various areas, including streaming engine aggregations, serialization, and Rust/Python data conversion. It also introduces several new features like SQL support for NORMALIZE and enhancements to Parquet handling and cloud storage integration.

Migration Steps

If you were relying on the positional argument for `str.to_decimal`, update your calls to use keyword arguments instead.

✨ New Features

Added SQL support for the `NORMALIZE` string function.
Added 'allow_exact_matches' option to 'join_asof'.
Added new-streaming first/last aggregations.
Added Parquet Sink to the new streaming engine.
Made automatic use of Azure storage account keys opt-in.
Improved `GroupsProxy/GroupsPosition` to be sliceable and cheaply cloneable.
Added `str.normalize()` function.
Allowed more group_by agg expressions in the new streaming engine.
Support loading Excel Table objects by name.
Support writing to file objects from `write_excel`.
Added hint to error message for extra struct field in
Added `index_of()` function to `Series` and `Expr`.
Updated `sqlparser-rs`, enabling "LEFT" keyword to be optional for anti/semi joins in SQL queries.
Added `cat.starts_with`/`cat.ends_with` functionality.
Implemented CSV, IPC and NDJson in the `MultiScanExec` node.

🐛 Bug Fixes

Avoided blocking on async runtime when resolving cloud scans.
Fixed `allow_invalid_certificates` being ignored in `storage_options`.
Fixed incorrect output type for `map_groups` returning all-NULL column.
Fixed `unique(maintain_order=True)` raising `InvalidOperationError` for null array.
Prevented collapsing into a Nested Loop Join if the cross join maintains order.
Prevented serialization of credentials provider.
Fixed `Series.n_unique` raising for list of struct.
Fixed incorrect top-k by sorted column, fixed `head()` returning extra rows.
Added outer validity to AnyValueBufferTrusted for structs.
Prevented partitioning group-by with non-scalar literals in agg.
Fixed xor operation of selector with Expr.
Fixed incorrect view buffer dedup.
Only verify Parquet ConvertedType if no LogicalType is given.
Validated length of `schema_overrides` in `read_csv`.
Fixed `map_elements` ignoring `skip_nulls=True` for struct dtype.
Checked for MAP-GROUPS in cloud-eligible.
Fixed empty output of `to_arrow()` on filtered unit height DataFrame.
Added `.default` to azure credential provider scope URL.
Fixed `join_asof` panicking for invalid `tolerance` input.
Fixed incorrect flag check on is_elementwise.
Prevented panic but set null type if type is unknown.
Fixed performance regression for DataFrame serialization/pickling.
Fixed `Int128` dtype serialization.
Ensured `read_excel` and `read_ods` support reading from raw `bytes` for all engines.
Ensured that SQL `LIKE` and `ILIKE` operators support multi-line matches.
Properly broadcasted in sort_by.
Properly loaded nested Parquet Statistics.
Fixed AWS environment config not loading when credential provider was used.
Fixed order observability of group-by-dyn.
Ensured soundness when loading Parquet string statistics.
Fixed error filtering after `with_columns()` on unit height LazyFrame.
Propagated `tenant_id` to `CredentialProviderAzure` if given.
Restored symbols on Apple by bumping nightly version.
Fixed type annotation of `str.strip_chars_*` methods.
Fixed variable name in error message for "unsupported data type" in rolling and upsampling operations.

🔧 Affected Symbols

str.to_decimalBitmapBuilderGrowablesSeriesTraitChunkedArrayDataFrameParquetverify_dict_indicesstorage_optionsmap_groupsuniquejoin_asofSeries.n_uniquemap_elementsread_csvInt128read_excelread_odssort_byCredentialProviderAzurestr.strip_chars_*LazyFrame.fill_null

⚡ Deprecations

The parameter of `str.to_decimal` is now keyword-only.