Change8

py-1.31.0

Breaking Changes
📦 polarsView on GitHub →
1 breaking12 features🐛 49 fixes1 deprecations🔧 22 symbols

Summary

This release removes the old streaming engine, introduces DataType expressions and Iceberg positional delete support, and includes numerous performance optimizations and bug fixes across various operations.

⚠️ Breaking Changes

  • The old streaming engine has been removed. Users relying on the previous streaming implementation must update their code to use the new engine.

✨ New Features

  • Introduction of DataType expressions in Python.
  • Native implementation for Iceberg positional deletes.
  • Basic implementation of `DataTypeExpr` in the Rust DSL.
  • Added `required: bool` to `ParquetFieldOverwrites`.
  • Support serializing `name.map_fields`.
  • Support serializing `Expr::RenameAlias`.
  • Added `keys` column in `finish_callback`.
  • Added `extra_columns` parameter to `scan_parquet`.
  • Added CORR function to polars SQL.
  • Added per partition sort and finish callback to sinks.
  • Support descendingly-sorted values in `search_sorted()`.
  • Derive DSL schema.

🐛 Bug Fixes

  • Removed axis from `show_graph`.
  • Removed axis ticks in `show_graph`.
  • Restricted custom `aggregate_function` in `pivot` to `pl.element()`.
  • Fixed `SourceToken` leak in in-memory sink linearize.
  • Fixed panic when reading empty parquet with multiple boolean columns.
  • Raise ComputeError instead of panicking in `truncate` when mixing month/week/day/sub-daily units.
  • Materialized `list.eval` with unknown type.
  • Only set sorting flag for 1st column with PQ SortingColumns.
  • Fixed typo in AExprBuilder.
  • Fixed null return from var/std on scalar column.
  • Supported Datetime broadcast in `list.concat`.
  • Ensured projection pushdown maintains right table schema.
  • Added Null dtype support to arg_sort_by.
  • Raise error by default on invalid CSV quotes.
  • Fixed group_by mean and median returning all nulls for Decimal dtype.
  • Fixed hive partition pruning not filtering out `__HIVE_DEFAULT_PARTITION__`.
  • Fixed `AssertionError` when using `scan_delta()` on AWS with `storage_options`.
  • Fixed deadlock on `collect(background=True)` / `collect_concurrently()`.
  • Fixed incorrect null count in rolling_min/max.
  • Preserved `file://` in LazyFrame node traverser.
  • Respected column order in `register_io_source` schema.
  • Stopped calling unnest for objects implementing `__arrow_c_array__`.
  • Fixed incorrect output when using `sort` with `group_by` and `cum_sum`.
  • Implemented owned arithmetic for Int128.
  • Stopped schema-matching structs with different field counts.
  • Fixed confusing error message on duplicate row_index.
  • Added `include_nulls` to `Agg::Count` CSE check.
  • Fixed view buffer exceeding 2^32 - 1 bytes in concatenate_view.
  • Fixed incorrect result selecting `pl.len()` from `scan_csv` with `skip_lines`.
  • Allowed for IO plugins with reordered columns in streaming.
  • Fixed inconsistency in `str.zfill` method when string contained leading '+'.
  • Fixed integer underflow in `propagate_nulls`.
  • Fixed setting `compat_level=0` for `sink_ipc`.
  • Narrowed return type for `DataType.is_`, improving Pyright's type completeness.
  • Supported arrow Decimal32 and Decimal64 types.
  • Guarded against dictionaries being passed to projection keywords.
  • Updated arrow format.
  • Fixed filter pushdown to IO plugins.
  • Improved numeric stability for rolling_mean<f32>.
  • Guarded against invalid nested objects in 'map_elements'.
  • Allowed subclasses in type equality checking.
  • Returned early in `pl.Expr.__array_ufunc__` when only single input.
  • Added inline implodes in type coercion.
  • Added {top, bottom}_k_by to Series.
  • Corrected `int_ranges` to raise error on invalid inputs.
  • Stopped silently overflowing for temporal casts.
  • Fixed error using `write_csv` with `storage_options`.
  • Fixed schema resolution `.over(mapping_strategy="join")` with non-aggregations.
  • Ensured rename behaves the same as select.

🔧 Affected Symbols

scan_parquetpivottruncatelist.evalarg_sort_byrolling_minrolling_maxrolling_mean<f32>cum_sumstr.zfillpl.Expr.__array_ufunc__top_k_bybottom_k_byint_rangeswrite_csvregister_io_sourcescan_deltacollect(background=True)collect_concurrently()scan_csvrenameselect

⚡ Deprecations

  • The `allow_missing_columns` parameter in `scan_parquet` is deprecated in favor of the new `missing_columns` parameter.