py-1.25.2
📦 polarsView on GitHub →
✨ 20 features🐛 38 fixes⚡ 1 deprecations🔧 12 symbols
Summary
This release introduces significant performance enhancements, including common subplan elimination and linear-time rolling operations, alongside new features like lazy sinks and expanded support for streaming engines.
✨ New Features
- Enable common subplan elimination across plans in collect_all
- Add lazy sinks
- Add PartitionByKey for new streaming sinks
- Enable new streaming memory sinks by default
- Add support for rolling_(sum/min/max) for booleans through casting
- Support multi-column sort for all nested types and nested search-sorted
- Add mkdir flag to sinks
- Enable joins on list/array dtypes
- Add a config option to specify the default engine to attempt to use during lazyframe calls
- Support all elementwise functions in IO plugin predicates
- Stabilize Enum datatype
- Support Polars int128 in from arrow
- Cloud support for new-streaming scans and sinks
- Add len method to arr
- Closeable files on unix
- Add new PartitionMaxSize sink
- Support engine callback for LazyFrame.profile
- Dispatch new-streaming CSV negative slice to separate node
- Add NDJSON source to new streaming engine
- Support passing token in storage_options for GCP cloud
🐛 Bug Fixes
- Expose and document partitions
- Fix lazy schema for truediv ops involving List/Array dtypes
- Fix error due to race condition in file cache
- Clear NaNs due to zero-weight division in rolling var/std
- Allow init from BigQuery Arrow data containing ExtensionType cols with irrelevant metadata
- Disallow cast from boolean to categorical/enum
- Don't check sortedness in join_asof when 'by' groups supplied, but issue warning
- Incorrect multithread path taken for aggregations
- Disallow cast to empty Enum
- Fix list.mean and list.median returning Float64 for temporal types
- Incorrect (FixedSize)ListArrayBuilder gather implementation
- Always fallback in SkipBatchPredicate
- New streaming multiscan deadlock
- Ensure new-streaming join BuildState is correct even if never fed morsels
- IO plugin; support empty iterator
- Support nulls in multi-column sort
- Window function check length of groups state
- Support 128 sum reduction on new streaming
- IPC round-trip of list of empty view with non-empty bufferset
- Variance can never be negative
- Incorrect loop length in new-streaming group by
- Right join on multiple columns not coalescing left_on columns
- Casting Struct to String panics if n_chunks > 1
- Fix Future attached to different loop error on read_database_uri
- Fix deadlock in cache + hconcat
- Properly handle phase transitions in row-wise sinks
- Always use global registry for object
- Check enum categories when reading csv
- Unspecialized prefiltering on nullable arrays
- Release the gil on explain
- Take into account scalar/partitioned columns in DataFrame::split_chunks
- Bad null handling in unordered row encoding
- Fix deadlock in new streaming CSV / NDJSON sinks
- Bad view index in BinaryViewBuilder
- Fix CSV count with comment prefix skipped empty lines
- New streaming IPC enum scan
- Several aspects related to ParquetColumnExpr
- Don't hit parquet::pre-filtered in case of pre-slice
🔧 Affected Symbols
collect_allrolling_min/maxInputIndependentSelectInMemorySourceNoderolling_(sum/min/max)PartitionByKeyjoin_asoflist.meanlist.medianLazyFrame.profileread_database_uriDataFrame::split_chunks⚡ Deprecations
- Remove old-streaming from engine argument