Change8

py-1.35.0

📦 polarsView on GitHub →
22 features🐛 46 fixes2 deprecations🔧 83 symbols

Summary

This release focuses heavily on performance improvements across group-by operations, native aggregations, and data parsing, while stabilizing Decimal support and introducing several new features like ewm_mean in streaming and enhanced list/array aggregations. Two functions, Expr.agg_groups() and pl.groups(), have been deprecated.

Migration Steps

  1. If you rely on Expr.agg_groups() or pl.groups(), update your code to use the new recommended alternatives (not explicitly listed, but implied by deprecation).
  2. If you were using string values declared with temporal dtype during Series initialization, note that its behavior is now consistent with DataFrame initialization.
  3. If you encounter issues with Parquet empty struct roundtripping, check if setting the relevant environment variable resolves it.

✨ New Features

  • Decimal support is stabilized.
  • Support for ewm_mean() in the streaming engine.
  • Improved row-count estimates.
  • Introduction of a remote Polars MCP server.
  • Allow local scans on polars cloud (configurable).
  • Added Expr.item to strictly extract a single value from an expression.
  • Added environment variable to roundtrip empty struct in Parquet.
  • Fast-count for scan_iceberg().select(len()).
  • Added glob parameter to scan_ipc.
  • Added list.agg and arr.agg.
  • Implemented {Expr,Series}.rolling_rank().
  • Make Series init consistent with DataFrame init for string values declared with temporal dtype.
  • Support MergeSorted in CSPE.
  • Duration/interval string parsing is 2-5x faster.
  • Recursively apply CSPE.
  • Added streaming engine per-node metrics.
  • Added arr.eval.
  • Add union() function for unordered concatenation.
  • Add name.replace to the set of column rename options.
  • Support np.ndarray -> AnyValue conversion.
  • Allow duration strings with positive leading "+".
  • Add support for UInt128 to pyo3-polars.

🐛 Bug Fixes

  • Re-enabled CPU feature check before import.
  • Implemented read_excel workaround for fastexcel/calamine issue loading a column subset from a named table.
  • Corrected correctness of any(ignore_nulls) and OOB in all.
  • Fixed streaming any/all with ignore_nulls=False.
  • Fixed incorrect join_asof on a casted expression.
  • Optimized memory on rolling groups in ApplyExpr.
  • Fallback Pyarrow scan to in-memory engine.
  • Make Operator::swap_operands return correct operators for Plus, Minus, Multiply and Divide.
  • Capitalized letters after numbers in to_titlecase.
  • Preserved null values in pct_change.
  • Raised length mismatch on over with sliced groups.
  • Checked duplicate name in transpose.
  • Followed Kleene logic in any / all for group-by.
  • Do not optimize cross join to iejoin if order maintaining.
  • Fixed typing of scan_parquet partially unknown.
  • Properly released the GIL for read_parquet_metadata.
  • Broadcasted partition_by columns in over expression.
  • Cleared index cache on stacked df.filter expressions.
  • Fixed 'explode' mapping strategy on scalar value.
  • Fixed repeated with_row_index() after scan() silently ignored.
  • Correctly returned min and max for enums in groupby aggregation.
  • Refactored BinaryExpr in group_by dispatch logic.
  • Fixed aggstate for gather.
  • Kept scalars for length preserving functions in group_by.
  • Fixed duplicate select panic.
  • Fixed inconsistency of list.sum() result type with None values.
  • Fixed division by zero in Expr.dt.truncate.
  • Fixed potential deadlock in __arrow_c_stream__.
  • Allowed double aggregations in group-by contexts.
  • Fixed Series.shrink_dtype for i128/u128.
  • Fixed dtype in EvalExpr.
  • Allowed aggregations on AggState::LiteralScalar.
  • Dispatched to group_aware for fallible expressions with masked out elements.
  • Fixed error for arr.sum() on small integer Array dtypes containing nulls.
  • Fixed regression on write_database() to Snowflake due to unsupported string view type.
  • Fixed XOR did not follow kleene when one side is unit-length.
  • Fixed incorrect precision in Series.str.to_decimal.
  • Used overlapping instead of rolling.
  • Fixed iterable on dynamic_group_by and rolling object.
  • Used Kahan summation for in-memory groupby sum/mean.
  • Released GIL in PythonScan predicate evaluation.
  • Fixed type error in bitmask::nth_set_bit_u64.
  • Added Expr.sign for Decimal datatype.
  • Corrected str.replace with missing pattern.
  • Ensured schema_overrides is respected when loading iterable row data.
  • Supported decimal_comma on Decimal type in write_csv.

🔧 Affected Symbols

foldhashhashbrownuniquen_uniquetake{_slice,}_uncheckedskewkurtosisbitwise_*group_by_dynamicPyIcebergfilter/drop_nulls/drop_nanscumulative_evalDslPlannull_countanyallreversearrow/parquet/IPC/pickle exportapprox_n_uniqueDuration/interval string parsingfirst/last aggregation on Decimals, Categoricals and EnumsBitMapIter::nthewm_mean()scan_icebergExpr.itemscan_ipclist.aggarr.aggExpr.rolling_rank()Series.rolling_rank()read_database_uriSeries initialization (string values with temporal dtype)CSPEarr.evalread_databaseiter_batchesrolling_(sum|mean)Data.unnestLazyFrame.unnestname.replacenp.ndarray -> AnyValue conversionDataFrame load from list of dicts (schema_overrides cast)UInt128read_excelfastexcel/calaminejoin_asofApplyExprscan_parquetread_parquet_metadatapartition_byover expressiondf.filterexplodewith_row_index()scan()BinaryExprgatherrange featuredtype-array featurelist.sum()Expr.dt.truncate__arrow_c_stream__Series.shrink_dtypeEvalExprAggState::LiteralScalararr.sum()write_database() to SnowflakeXOR operationSeries.str.to_decimalExpr.signstr.replacewrite_csvExpr.agg_groups()pl.groups()LazyFrame.set_sortedFunctionIR::HintGroupByPartitionedelement()AExpr::ElementExpr::ElementScanOptionsnew_from_ipcdelta te

⚡ Deprecations

  • Expr.agg_groups() is deprecated.
  • pl.groups() is deprecated.