py-1.25.2

📅 Mar 15, 2025📦 polarsView on GitHub →

✨ 20 features🐛 38 fixes⚡ 1 deprecations🔧 12 symbols

Summary

This release introduces significant performance enhancements, including common subplan elimination and linear-time rolling operations, alongside new features like lazy sinks and expanded support for streaming engines.

✨ New Features

Enable common subplan elimination across plans in collect_all
Add lazy sinks
Add PartitionByKey for new streaming sinks
Enable new streaming memory sinks by default
Add support for rolling_(sum/min/max) for booleans through casting
Support multi-column sort for all nested types and nested search-sorted
Add mkdir flag to sinks
Enable joins on list/array dtypes
Add a config option to specify the default engine to attempt to use during lazyframe calls
Support all elementwise functions in IO plugin predicates
Stabilize Enum datatype
Support Polars int128 in from arrow
Cloud support for new-streaming scans and sinks
Add len method to arr
Closeable files on unix
Add new PartitionMaxSize sink
Support engine callback for LazyFrame.profile
Dispatch new-streaming CSV negative slice to separate node
Add NDJSON source to new streaming engine
Support passing token in storage_options for GCP cloud

🐛 Bug Fixes

Expose and document partitions
Fix lazy schema for truediv ops involving List/Array dtypes
Fix error due to race condition in file cache
Clear NaNs due to zero-weight division in rolling var/std
Allow init from BigQuery Arrow data containing ExtensionType cols with irrelevant metadata
Disallow cast from boolean to categorical/enum
Don't check sortedness in join_asof when 'by' groups supplied, but issue warning
Incorrect multithread path taken for aggregations
Disallow cast to empty Enum
Fix list.mean and list.median returning Float64 for temporal types
Incorrect (FixedSize)ListArrayBuilder gather implementation
Always fallback in SkipBatchPredicate
New streaming multiscan deadlock
Ensure new-streaming join BuildState is correct even if never fed morsels
IO plugin; support empty iterator
Support nulls in multi-column sort
Window function check length of groups state
Support 128 sum reduction on new streaming
IPC round-trip of list of empty view with non-empty bufferset
Variance can never be negative
Incorrect loop length in new-streaming group by
Right join on multiple columns not coalescing left_on columns
Casting Struct to String panics if n_chunks > 1
Fix Future attached to different loop error on read_database_uri
Fix deadlock in cache + hconcat
Properly handle phase transitions in row-wise sinks
Always use global registry for object
Check enum categories when reading csv
Unspecialized prefiltering on nullable arrays
Release the gil on explain
Take into account scalar/partitioned columns in DataFrame::split_chunks
Bad null handling in unordered row encoding
Fix deadlock in new streaming CSV / NDJSON sinks
Bad view index in BinaryViewBuilder
Fix CSV count with comment prefix skipped empty lines
New streaming IPC enum scan
Several aspects related to ParquetColumnExpr
Don't hit parquet::pre-filtered in case of pre-slice

🔧 Affected Symbols

collect_allrolling_min/maxInputIndependentSelectInMemorySourceNoderolling_(sum/min/max)PartitionByKeyjoin_asoflist.meanlist.medianLazyFrame.profileread_database_uriDataFrame::split_chunks

⚡ Deprecations

Remove old-streaming from engine argument