Change8

py-1.35.0-beta.1

📦 polars
18 features🐛 42 fixes🔧 61 symbols

Summary

This release focuses heavily on performance improvements across group-by operations, data serialization, and string parsing. Numerous enhancements were added, including new aggregation methods and improved ADBC engine integration, alongside fixes for various edge cases and regressions.

Migration Steps

  1. If you were relying on specific behavior related to string values declared with temporal dtypes during Series initialization, note that initialization is now consistent with DataFrame initialization.
  2. If you were using 'rolling' groups, they have been extended and renamed to 'overlapping'.

✨ New Features

  • Added environment variable to roundtrip empty struct in Parquet.
  • Implemented fast-count for scan_iceberg().select(len()).
  • Added 'glob' parameter to scan_ipc.
  • Added list.agg and arr.agg functionality.
  • Implemented {Expr,Series}.rolling_rank().
  • Made Series initialization consistent with DataFrame initialization for string values declared with temporal dtype.
  • Added support for MergeSorted in CSPE.
  • Implemented cumulative_eval using the group-by engine.
  • Implemented native null_count, any and all group-by aggregations.
  • Added streaming engine per-node metrics.
  • Added arr.eval.
  • Added 'separator' to {Data,Lazy}Frame.unnest.
  • Added union() function for unordered concatenation.
  • Added name.replace to the set of column rename options.
  • Added support for np.ndarray -> AnyValue conversion.
  • Allow duration strings with extra leading "+".
  • Added support for UInt128 to pyo3-polars.
  • Added Expr.sign for Decimal datatype.

🐛 Bug Fixes

  • Addressed group_by_dynamic slowness in sparse data.
  • Pushed filters to PyIceberg.
  • Implemented native filter/drop_nulls/drop_nans in group-by context.
  • Prevented generation of copies of Dataframes in DslPlan serialization.
  • Speed up reverse in group-by context.
  • Pruned unused categorical values when exporting to arrow/parquet/IPC/pickle.
  • Stopped checking duplicates on streaming simple projection in release mode.
  • Lowered approx_n_unique to the streaming engine.
  • Optimized Duration/interval string parsing (2-5x faster).
  • Used native reducer for first/last on Decimals, Categoricals and Enums.
  • Implemented indexed method for BitMapIter::nth.
  • Pushed down slices on plans within unions.
  • Properly released the GIL for read_parquet_metadata.
  • Broadcasted partition_by columns in over expression.
  • Cleared index cache on stacked df.filter expressions.
  • Fixed 'explode' mapping strategy on scalar value.
  • Fixed repeated with_row_index() after scan() silently ignored.
  • Correctly returned min and max for enums in groupby aggregation.
  • Refactored BinaryExpr in group_by dispatch logic.
  • Fixed aggstate for gather.
  • Kept scalars for length preserving functions in group_by.
  • Fixed duplicate select panic.
  • Fixed inconsistency of list.sum() result type with None values.
  • Fixed division by zero in Expr.dt.truncate.
  • Fixed potential deadlock in __arrow_c_stream__.
  • Allowed double aggregations in group-by contexts.
  • Fixed Series.shrink_dtype for i128/u128.
  • Fixed dtype in EvalExpr.
  • Allowed aggregations on AggState::LiteralScalar.
  • Dispatched to group_aware for fallible expressions with masked out elements.
  • Fixed error for arr.sum() on small integer Array dtypes containing nulls.
  • Fixed regression on write_database() to Snowflake due to unsupported string view type.
  • Fixed XOR did not follow kleene when one side is unit-length.
  • Fixed incorrect precision in Series.str.to_decimal.
  • Used overlapping instead of rolling.
  • Fixed iterable on dynamic_group_by and rolling object.
  • Used Kahan summation for in-memory groupby sum/mean.
  • Released GIL in PythonScan predicate evaluation.
  • Fixed type error in bitmask::nth_set_bit_u64.
  • Corrected str.replace with missing pattern.
  • Ensured schema_overrides is respected when loading iterable row data.
  • Supported decimal_comma on Decimal type in write_csv.

🔧 Affected Symbols

group_by_dynamicscan_icebergscan_ipcfilterdrop_nullsdrop_nanscumulative_evalDslPlannull_countanyallreverseDataframelist.aggarr.aggExpr.rolling_rankSeries.rolling_rankread_database_uriSeriesDataFrameCSPEarr.evalread_databaseiter_batchesrolling_sumrolling_meanData.unnestLazyFrame.unnestunion()name.replaceExpr.dt.truncateread_parquet_metadatapartition_byoverdf.filterexplodewith_row_index()scan()list.sum()Series.shrink_dtypeEvalExprarr.sum()write_database()SnowflakeXORSeries.str.to_decimalbitmask::nth_set_bit_u64str.replaceschema_overrideswrite_csvpl.formatGroupByPartitionedelement()AExpr::ElementScanOptionsnew_from_ipcFunctionExprApplyExprDataTypepl.fielddays_in_month