rs-0.54.4
📦 polarsView on GitHub →
✨ 44 features🐛 11 fixes🔧 53 symbols
Summary
This release focuses heavily on performance improvements across various operations, significant stabilization and expansion of the streaming engine capabilities, and numerous enhancements to SQL support and data source handling (like Parquet and cloud storage).
Migration Steps
- When using SQL division, be aware that the / operator now uses true division.
- If relying on specific allocation behavior, review usage of default_alloc as it has been removed in favor of fast_alloc feature flag.
✨ New Features
- Added LazyFrame.gather functionality.
- Implemented nested common subplan elimination.
- Stabilized the streaming engine.
- Improved parquet metadata decoding speed using a hand-written Thrift implementation.
- Added streaming support for grouped AsOf join.
- Exposed fixed-size rolling window expressions in Python visitor.
- Exposed IR::Scan hive parts in the python node visitor.
- Exposed IRFunctionExpr::DynamicPred in the python visitor.
- Added pinning and queuing logic to polars-ooc.
- Added tiered multi-file parquet metadata resolver.
- Added caching and shuffling for DNS in cloud object_store.
- Allowed deeper expressions in optimization.
- Added is_inherently_nondeterministic helper for AExpr.
- Used true division for the / operator in Polars SQL.
- Added Rust backend for Expr.has_nulls.
- Added block_in_place to Polars' async executor.
- Stabilized float16 operations.
- Added Expr.is_empty.
- Added support for the SQL FILTER clause for aggregate functions, and STRING_AGG.
- Made parquet FileMetadata prunable for IR-plan dispatch.
- Broadcasted scalar input for list.slice.
- Added null_on_oob parameter in {Expr/Series}.gather.
- Added maintain_order parameter to merge_sorted.
- Added ignore_nulls to {list,arr}.{any,all}.
- Added is_unique to list/array dtypes.
- Added pl.merge_sorted operating on multiple frames.
- Added fast_alloc feature flag, removed default_alloc.
- Added a GPU slot to OptFlags to control CSE.
- Allowed group_by() without key exprs.
- Used UUIDv7 for sink_iceberg directory name generation.
- Supported Delta deletion vectors in scan_delta.
- Supported Decimal32/64 in scan_parquet.
- Supported casting Duration to String in ISO 8601 format.
- Added a streaming range-join.
- Supported Expr for holidays in business day calculations.
- Added parameter for pivot to always include value column name.
- Extended Expr.reinterpret to all numeric types of the same size.
- Added missing_columns parameter to scan_csv.
- Supported nested datatypes for {min,max}_by.
- Supported SQL ARRAY init from typed literals.
- Accepted table identifier string in scan_iceberg().
- Added an unstable LazyFrame.sink_iceberg.
- Added maintain order argument on implode.
- Implemented predicate pushdown for aliased groupby keys.
🐛 Bug Fixes
- Fixed SchemaError when using lazy HConcat->Sink.
- Made is_in row-group pruning precise on null-containing haystacks.
- Made is_in row-group pruning precise on multi-value lists.
- Avoided unnecessary rechunk when sorting already sorted DataFrame.
- Optimized drop_nulls().{first,last}() to {first,last}(ignore_nulls=True).
- Made cut output Enum and marked as elementwise.
- Removed unused expression sorts.
- Dropped unused filter column above cache.
- Optimized .replace() from a single value.
- Error raised in .collect_schema() when arr.get() is out-of-bounds.
- Cleared no-op scan projections.
Affected Symbols
LazyFrame.gatherScalarColumnarray.shiftlist.sampleagg_n_uniquelist.uniquearr.uniquelist.n_uniquearr.n_uniquelist.reversearr.reverseSeries::is_sortedlist.shiftjson_decodeto_numpyselect(len())joinis_inFunctionExprIRColumnarFunctionNode__array_ufunc__drop_{nulls,nans}group_byentropyinterpolatestrptimeskewkurtosiscutExpr.has_nullsExpr.is_emptySTRING_AGGExpr.gatherSeries.gatherlist.anylist.allarr.anyarr.allpl.merge_sortedsink_deltaindex_ofbackward_fillforward_fillarg_{min,max}scan_csvscan_ndjsonscan_linesscan_deltascan_parquetscan_icebergExpr.reinterpretpivotimplode