py-1.31.0
Breaking Changes📦 polarsView on GitHub →
⚠ 1 breaking✨ 12 features🐛 49 fixes⚡ 1 deprecations🔧 22 symbols
Summary
This release removes the old streaming engine, introduces DataType expressions and Iceberg positional delete support, and includes numerous performance optimizations and bug fixes across various operations.
⚠️ Breaking Changes
- The old streaming engine has been removed. Users relying on the previous streaming implementation must update their code to use the new engine.
✨ New Features
- Introduction of DataType expressions in Python.
- Native implementation for Iceberg positional deletes.
- Basic implementation of `DataTypeExpr` in the Rust DSL.
- Added `required: bool` to `ParquetFieldOverwrites`.
- Support serializing `name.map_fields`.
- Support serializing `Expr::RenameAlias`.
- Added `keys` column in `finish_callback`.
- Added `extra_columns` parameter to `scan_parquet`.
- Added CORR function to polars SQL.
- Added per partition sort and finish callback to sinks.
- Support descendingly-sorted values in `search_sorted()`.
- Derive DSL schema.
🐛 Bug Fixes
- Removed axis from `show_graph`.
- Removed axis ticks in `show_graph`.
- Restricted custom `aggregate_function` in `pivot` to `pl.element()`.
- Fixed `SourceToken` leak in in-memory sink linearize.
- Fixed panic when reading empty parquet with multiple boolean columns.
- Raise ComputeError instead of panicking in `truncate` when mixing month/week/day/sub-daily units.
- Materialized `list.eval` with unknown type.
- Only set sorting flag for 1st column with PQ SortingColumns.
- Fixed typo in AExprBuilder.
- Fixed null return from var/std on scalar column.
- Supported Datetime broadcast in `list.concat`.
- Ensured projection pushdown maintains right table schema.
- Added Null dtype support to arg_sort_by.
- Raise error by default on invalid CSV quotes.
- Fixed group_by mean and median returning all nulls for Decimal dtype.
- Fixed hive partition pruning not filtering out `__HIVE_DEFAULT_PARTITION__`.
- Fixed `AssertionError` when using `scan_delta()` on AWS with `storage_options`.
- Fixed deadlock on `collect(background=True)` / `collect_concurrently()`.
- Fixed incorrect null count in rolling_min/max.
- Preserved `file://` in LazyFrame node traverser.
- Respected column order in `register_io_source` schema.
- Stopped calling unnest for objects implementing `__arrow_c_array__`.
- Fixed incorrect output when using `sort` with `group_by` and `cum_sum`.
- Implemented owned arithmetic for Int128.
- Stopped schema-matching structs with different field counts.
- Fixed confusing error message on duplicate row_index.
- Added `include_nulls` to `Agg::Count` CSE check.
- Fixed view buffer exceeding 2^32 - 1 bytes in concatenate_view.
- Fixed incorrect result selecting `pl.len()` from `scan_csv` with `skip_lines`.
- Allowed for IO plugins with reordered columns in streaming.
- Fixed inconsistency in `str.zfill` method when string contained leading '+'.
- Fixed integer underflow in `propagate_nulls`.
- Fixed setting `compat_level=0` for `sink_ipc`.
- Narrowed return type for `DataType.is_`, improving Pyright's type completeness.
- Supported arrow Decimal32 and Decimal64 types.
- Guarded against dictionaries being passed to projection keywords.
- Updated arrow format.
- Fixed filter pushdown to IO plugins.
- Improved numeric stability for rolling_mean<f32>.
- Guarded against invalid nested objects in 'map_elements'.
- Allowed subclasses in type equality checking.
- Returned early in `pl.Expr.__array_ufunc__` when only single input.
- Added inline implodes in type coercion.
- Added {top, bottom}_k_by to Series.
- Corrected `int_ranges` to raise error on invalid inputs.
- Stopped silently overflowing for temporal casts.
- Fixed error using `write_csv` with `storage_options`.
- Fixed schema resolution `.over(mapping_strategy="join")` with non-aggregations.
- Ensured rename behaves the same as select.
🔧 Affected Symbols
scan_parquetpivottruncatelist.evalarg_sort_byrolling_minrolling_maxrolling_mean<f32>cum_sumstr.zfillpl.Expr.__array_ufunc__top_k_bybottom_k_byint_rangeswrite_csvregister_io_sourcescan_deltacollect(background=True)collect_concurrently()scan_csvrenameselect⚡ Deprecations
- The `allow_missing_columns` parameter in `scan_parquet` is deprecated in favor of the new `missing_columns` parameter.