rs-0.52.0
📦 polarsView on GitHub →
✨ 38 features🐛 42 fixes🔧 20 symbols
Summary
This release focuses heavily on performance improvements across lazy evaluation, group-by operations, and I/O, alongside numerous bug fixes, especially in SQL handling and streaming aggregations. New features include batch collection methods for LazyFrames and enhanced streaming support for various functions.
✨ New Features
- Added `LazyFrame.{sink,collect}_batches` methods.
- Implemented deterministic import order for Python Polars package variants.
- Added support for `ewm_var/std` in the streaming engine.
- Made DSL-hash skippable.
- Implemented streaming `{Expr,LazyFrame}.rolling`.
- Set polars/\<version> user-agent.
- Added `BIT_NOT` support to the SQL interface.
- Supported BYTE_ARRAY backed Decimals in Parquet.
- Added `allow_empty` flag to `item`.
- Supported `ewm_mean()` in the streaming engine.
- Improved row-count estimates.
- Removed filtered scan paths in IR when possible.
- Introduced remote Polars MCP server.
- Allowed local scans on polars cloud (configurable).
- Added `Expr.item` to strictly extract a single value from an expression.
- Added environment variable to roundtrip empty struct in Parquet.
- Added `glob` parameter to `scan_ipc`.
- Added `list.agg` and `arr.agg`.
- Implemented `{Expr,Series}.rolling_rank()`.
- Supported MergeSorted in CSPE.
- Recursively applied CSPE.
- Added streaming engine per-node metrics.
- Added `arr.eval`.
- Added `nth_set_bit_u64()` with unit test.
- Added `separator` to `{Data,Lazy}Frame.unnest`.
- Added `union()` function for unordered concatenation.
- Added `name.replace` to the set of column rename options.
- Allowed duration strings with leading "+".
- Dropped now-unnecessary post-init "schema_overrides" cast on `DataFrame` load from list of dicts.
- Added support for UInt128 to pyo3-polars.
- Implemented maintain_order for cross join.
- Added support to output `dt.total_{}()` duration values as fractionals.
- Supported scanning from `file:/path` URIs.
- Logged which file the schema was sourced from, and which file caused an extra column error.
- Added support to display lazy query plan in marimo notebooks without needing to install matplotlib or mermaid.
- Added unstable `hidden_file_prefix` parameter to `scan_parquet`.
- Used fixed-scale Decimals.
- Added support for unsigned 128-bit integers.
🐛 Bug Fixes
- Fixed CSV `select(len())` off by 1 with comment prefix.
- Fixed incorrect reshape on sliced lists.
- Supported "index" as column name in `group_by` iterator.
- Ensured DSL_SCHEMA_HASH does not change by line endings.
- Solved multiple issues relating to arena mutation in SQL subqueries.
- Fixed panic in `dt.truncate` for invalid duration strings.
- Stopped triggering `DeprecationWarning` from SQL "IN" constraints that use subqueries.
- Returned the correct string-case `Expr` reprs.
- Fixed `groups` update on slices with different offsets.
- Fixed handling `Null` dtype in `ApplyExpr` on `group_by`.
- Raised error for all/any on list instead of panic.
- Fixed unique key names in streaming sort/top_k.
- Ensured the `SQL` interface uses logical, not bitwise, behaviour for unary "NOT" operator.
- Fixed panic if scan predicate produces 0 length mask.
- Ensured SQL table alias resolution checks against CTE aliases on fallback.
- Fixed panic in `group_by_dynamic` with `group_by` and multiple chunks.
- Fixed panic when using struct field as join key.
- Allowed broadcast in `group_by` for `ApplyExpr` and `BinaryExpr`.
- Fixed field metadata for nested categorical PyCapsule export.
- Blocked predicate pushdown when `group_by` key values are changed.
- Fixed Group-By aggregation problems caused by `AmortSeries`.
- Stopped pushing down predicates passed inserted cache nodes.
- Allowed for negative time in `group_by_dynamic` iterator.
- Re-enabled CPU feature check before import.
- Corrected `any(ignore_nulls)` and OOB in `all`.
- Fixed streaming any/all with ignore_nulls=False.
- Fixed incorrect `join_asof` on a casted expression.
- Fixed capitalisation of letters after numbers in to_titlecase.
- Preserved null values in `pct_change`.
- Raised length mismatch on `over` with sliced groups.
- Checked duplicate name in transpose.
- Followed Kleene logic in `any` / `all` for group-by.
- Did not optimize cross join to iejoin if order maintaining.
- Broadcasted `partition_by` columns in `over` expression.
- Cleared index cache on stacked `df.filter` expressions.
- Fixed 'explode' mapping strategy on scalar value.
- Fixed repeated `with_row_index()` after `scan()` silently ignored.
- Correctly returned min and max for enums in groupby aggregation.
- Fixed aggstate for `gather`.
- Kept scalars for length preserving functions in `group_by`.
- Fixed duplicate select panic.
- Fixed inconsistency of list.sum() result type.
🔧 Affected Symbols
LazyFrame.{sink,collect}_batchesdt.truncategroup_by_dynamicApplyExprBinaryExprscan_ipcExpr.itemscan_parquetdt.total_{}name.replaceunnestunionrolling_rankarr.evallist.aggarr.aggjoin_asofpct_changewith_row_indexscan