Change8

py-1.40.0

📦 polarsView on GitHub →
9 features🐛 52 fixes2 deprecations🔧 66 symbols

Summary

This release introduces streaming support for grouped AsOf joins and numerous performance optimizations across the engine, particularly in streaming operations. Several bugs related to joins, aggregations, and data reading have been fixed, and support for the dataframe interchange protocol has been deprecated.

Migration Steps

  1. Change default column name from "lines" to "line" when using scan/read_lines if you rely on the old name.
  2. If you were using `multiprocessing.dummy.Pool`, note that it has been replaced by `ThreadPoolExecutor`.

✨ New Features

  • Streaming support added for grouped AsOf join.
  • Added `ignore_nulls` argument to `{list,arr}.{any,all}` operations.
  • Added `is_unique` method to list/array dtypes.
  • Streaming pyarrow datasets sources implemented.
  • Added `pl.merge_sorted` operating on multiple frames.
  • Allow `group_by()` without key expressions.
  • Made `unnest()` effective on all columns by default.
  • Added native streaming implementation for `interpolate`.
  • Streaming `strptime` now supports `format=None`.

🐛 Bug Fixes

  • Updated `groups` to correct length for `Implode` operations.
  • Fixed `scan_csv` missing_columns='insert' overwriting existing data with NULLs.
  • Raise error on non-numeric inputs in `pl.int_ranges`.
  • Fixed always-true filter conversion to Iceberg filter.
  • Nulls are no longer skipped when enumerating over rows in grouped AsOf join.
  • Fixed `pivot` dropping data for null values in the 'on' column.
  • Resolved multiple files deadlock in CSV async reader.
  • Widen decimal precision on sum aggregation.
  • Corrected `lf.remote` type.
  • Defaulted `LazyFrame.map_batches` to no optimizations.
  • Extended `StructEval` schema context in `StackOptimizer`.
  • Preserved nulls when casting from all-null `Series` to `Struct`.
  • Fixed `scan_delta` filter on empty dataframe.
  • Prevented `DataFrame` creation panic on `list[struct]` with heterogenous types.
  • Named aggregation `__structify` was being ignored.
  • Skipped `null` group entries when collecting AsOf-by groups.
  • Fixed panic with empty `order_by` in over expression.
  • Wrote field ID from `sink_parquet`.
  • Fixed statistics for Null columns in Parquet.
  • Do not prune sort nodes containing slice with dynamic predicate.
  • Corrected grouped `Binary` `arg_min`/`arg_max` and `String` single-element arg indices.
  • Resolved multiple files deadlock in NDJSON async reader.
  • Fixed overflow panic in interpolate nearest.
  • Used checked arithmetic in `int96_to_i64_ns` to prevent overflow panic.
  • CSV fast count is no longer triggered if predicate is pushed down.
  • Supported all integer dtypes for Series index assignment.
  • Streaming sort by-expressions were lowered incorrectly.
  • Replaced `multiprocessing.dummy.Pool` with `ThreadPoolExecutor`.
  • Reset IO metrics instead of consuming them.
  • Output SVG if `output_path` ends with '.svg' in `show_graph`.
  • Skipped extension types for min/max in describe.
  • Addressed a potential overflow in `from_epoch` scaling.
  • Fixed incorrect IO metrics on multi-phase streaming execution.
  • Applied scalar bound in `clip` when the Series bound contains nulls.
  • Preserved casts for horizontal operations with untyped literals.
  • Rejected invalid input to `sql_expr`.
  • Ensured SQL `COUNT(<lit>)` expressions return the correct value.
  • Fixed regression in `replace_strict` for enums.
  • Made `test_group_by_arg_max_boolean_26978` non-flaky for `max_by` ties.
  • Corrected null count for aggregated list inside count aggregation.
  • Fixed panic in streaming `MergeSortedNode`.
  • Prevented panic in `transpose()` with mixed List and non-List columns.
  • Set sorted flag for Boolean and Time types.
  • Resolved stack overflow on `merge_sorted` and `union`.
  • Made `pl.DataFrame.fill_null` work on columns with `Null` dtype.
  • Fixed repeated word typos in comments.
  • Covariance with a constant is zero, not NaN.
  • Did not remove `set_sorted` in projection pushdown.
  • Inferred nulls when DataFrame created from empty-struct.
  • Corrected suggestion in multi-expr filter error.
  • Implemented `agg_arg_min`/`agg_arg_max` for `boolean` data type.
  • Ensured `sample()` respects the global set seed.

Affected Symbols

⚡ Deprecations

  • Support for dataframe interchange protocol is deprecated.
  • The parameter `ddof` in `rolling_corr` is ignored and deprecated.