py-1.35.0-beta.1
📦 polars
✨ 18 features🐛 42 fixes🔧 61 symbols
Summary
This release focuses heavily on performance improvements across group-by operations, data serialization, and string parsing. Numerous enhancements were added, including new aggregation methods and improved ADBC engine integration, alongside fixes for various edge cases and regressions.
Migration Steps
- If you were relying on specific behavior related to string values declared with temporal dtypes during Series initialization, note that initialization is now consistent with DataFrame initialization.
- If you were using 'rolling' groups, they have been extended and renamed to 'overlapping'.
✨ New Features
- Added environment variable to roundtrip empty struct in Parquet.
- Implemented fast-count for scan_iceberg().select(len()).
- Added 'glob' parameter to scan_ipc.
- Added list.agg and arr.agg functionality.
- Implemented {Expr,Series}.rolling_rank().
- Made Series initialization consistent with DataFrame initialization for string values declared with temporal dtype.
- Added support for MergeSorted in CSPE.
- Implemented cumulative_eval using the group-by engine.
- Implemented native null_count, any and all group-by aggregations.
- Added streaming engine per-node metrics.
- Added arr.eval.
- Added 'separator' to {Data,Lazy}Frame.unnest.
- Added union() function for unordered concatenation.
- Added name.replace to the set of column rename options.
- Added support for np.ndarray -> AnyValue conversion.
- Allow duration strings with extra leading "+".
- Added support for UInt128 to pyo3-polars.
- Added Expr.sign for Decimal datatype.
🐛 Bug Fixes
- Addressed group_by_dynamic slowness in sparse data.
- Pushed filters to PyIceberg.
- Implemented native filter/drop_nulls/drop_nans in group-by context.
- Prevented generation of copies of Dataframes in DslPlan serialization.
- Speed up reverse in group-by context.
- Pruned unused categorical values when exporting to arrow/parquet/IPC/pickle.
- Stopped checking duplicates on streaming simple projection in release mode.
- Lowered approx_n_unique to the streaming engine.
- Optimized Duration/interval string parsing (2-5x faster).
- Used native reducer for first/last on Decimals, Categoricals and Enums.
- Implemented indexed method for BitMapIter::nth.
- Pushed down slices on plans within unions.
- Properly released the GIL for read_parquet_metadata.
- Broadcasted partition_by columns in over expression.
- Cleared index cache on stacked df.filter expressions.
- Fixed 'explode' mapping strategy on scalar value.
- Fixed repeated with_row_index() after scan() silently ignored.
- Correctly returned min and max for enums in groupby aggregation.
- Refactored BinaryExpr in group_by dispatch logic.
- Fixed aggstate for gather.
- Kept scalars for length preserving functions in group_by.
- Fixed duplicate select panic.
- Fixed inconsistency of list.sum() result type with None values.
- Fixed division by zero in Expr.dt.truncate.
- Fixed potential deadlock in __arrow_c_stream__.
- Allowed double aggregations in group-by contexts.
- Fixed Series.shrink_dtype for i128/u128.
- Fixed dtype in EvalExpr.
- Allowed aggregations on AggState::LiteralScalar.
- Dispatched to group_aware for fallible expressions with masked out elements.
- Fixed error for arr.sum() on small integer Array dtypes containing nulls.
- Fixed regression on write_database() to Snowflake due to unsupported string view type.
- Fixed XOR did not follow kleene when one side is unit-length.
- Fixed incorrect precision in Series.str.to_decimal.
- Used overlapping instead of rolling.
- Fixed iterable on dynamic_group_by and rolling object.
- Used Kahan summation for in-memory groupby sum/mean.
- Released GIL in PythonScan predicate evaluation.
- Fixed type error in bitmask::nth_set_bit_u64.
- Corrected str.replace with missing pattern.
- Ensured schema_overrides is respected when loading iterable row data.
- Supported decimal_comma on Decimal type in write_csv.
🔧 Affected Symbols
group_by_dynamicscan_icebergscan_ipcfilterdrop_nullsdrop_nanscumulative_evalDslPlannull_countanyallreverseDataframelist.aggarr.aggExpr.rolling_rankSeries.rolling_rankread_database_uriSeriesDataFrameCSPEarr.evalread_databaseiter_batchesrolling_sumrolling_meanData.unnestLazyFrame.unnestunion()name.replaceExpr.dt.truncateread_parquet_metadatapartition_byoverdf.filterexplodewith_row_index()scan()list.sum()Series.shrink_dtypeEvalExprarr.sum()write_database()SnowflakeXORSeries.str.to_decimalbitmask::nth_set_bit_u64str.replaceschema_overrideswrite_csvpl.formatGroupByPartitionedelement()AExpr::ElementScanOptionsnew_from_ipcFunctionExprApplyExprDataTypepl.fielddays_in_month