rs-0.52.0

📅 Nov 3, 2025📦 polarsView on GitHub →

✨ 38 features🐛 42 fixes🔧 20 symbols

Summary

This release focuses heavily on performance improvements across lazy evaluation, group-by operations, and I/O, alongside numerous bug fixes, especially in SQL handling and streaming aggregations. New features include batch collection methods for LazyFrames and enhanced streaming support for various functions.

✨ New Features

Added `LazyFrame.{sink,collect}_batches` methods.
Implemented deterministic import order for Python Polars package variants.
Added support for `ewm_var/std` in the streaming engine.
Made DSL-hash skippable.
Implemented streaming `{Expr,LazyFrame}.rolling`.
Set polars/\<version> user-agent.
Added `BIT_NOT` support to the SQL interface.
Supported BYTE_ARRAY backed Decimals in Parquet.
Added `allow_empty` flag to `item`.
Supported `ewm_mean()` in the streaming engine.
Improved row-count estimates.
Removed filtered scan paths in IR when possible.
Introduced remote Polars MCP server.
Allowed local scans on polars cloud (configurable).
Added `Expr.item` to strictly extract a single value from an expression.
Added environment variable to roundtrip empty struct in Parquet.
Added `glob` parameter to `scan_ipc`.
Added `list.agg` and `arr.agg`.
Implemented `{Expr,Series}.rolling_rank()`.
Supported MergeSorted in CSPE.
Recursively applied CSPE.
Added streaming engine per-node metrics.
Added `arr.eval`.
Added `nth_set_bit_u64()` with unit test.
Added `separator` to `{Data,Lazy}Frame.unnest`.
Added `union()` function for unordered concatenation.
Added `name.replace` to the set of column rename options.
Allowed duration strings with leading "+".
Dropped now-unnecessary post-init "schema_overrides" cast on `DataFrame` load from list of dicts.
Added support for UInt128 to pyo3-polars.
Implemented maintain_order for cross join.
Added support to output `dt.total_{}()` duration values as fractionals.
Supported scanning from `file:/path` URIs.
Logged which file the schema was sourced from, and which file caused an extra column error.
Added support to display lazy query plan in marimo notebooks without needing to install matplotlib or mermaid.
Added unstable `hidden_file_prefix` parameter to `scan_parquet`.
Used fixed-scale Decimals.
Added support for unsigned 128-bit integers.

🐛 Bug Fixes

Fixed CSV `select(len())` off by 1 with comment prefix.
Fixed incorrect reshape on sliced lists.
Supported "index" as column name in `group_by` iterator.
Ensured DSL_SCHEMA_HASH does not change by line endings.
Solved multiple issues relating to arena mutation in SQL subqueries.
Fixed panic in `dt.truncate` for invalid duration strings.
Stopped triggering `DeprecationWarning` from SQL "IN" constraints that use subqueries.
Returned the correct string-case `Expr` reprs.
Fixed `groups` update on slices with different offsets.
Fixed handling `Null` dtype in `ApplyExpr` on `group_by`.
Raised error for all/any on list instead of panic.
Fixed unique key names in streaming sort/top_k.
Ensured the `SQL` interface uses logical, not bitwise, behaviour for unary "NOT" operator.
Fixed panic if scan predicate produces 0 length mask.
Ensured SQL table alias resolution checks against CTE aliases on fallback.
Fixed panic in `group_by_dynamic` with `group_by` and multiple chunks.
Fixed panic when using struct field as join key.
Allowed broadcast in `group_by` for `ApplyExpr` and `BinaryExpr`.
Fixed field metadata for nested categorical PyCapsule export.
Blocked predicate pushdown when `group_by` key values are changed.
Fixed Group-By aggregation problems caused by `AmortSeries`.
Stopped pushing down predicates passed inserted cache nodes.
Allowed for negative time in `group_by_dynamic` iterator.
Re-enabled CPU feature check before import.
Corrected `any(ignore_nulls)` and OOB in `all`.
Fixed streaming any/all with ignore_nulls=False.
Fixed incorrect `join_asof` on a casted expression.
Fixed capitalisation of letters after numbers in to_titlecase.
Preserved null values in `pct_change`.
Raised length mismatch on `over` with sliced groups.
Checked duplicate name in transpose.
Followed Kleene logic in `any` / `all` for group-by.
Did not optimize cross join to iejoin if order maintaining.
Broadcasted `partition_by` columns in `over` expression.
Cleared index cache on stacked `df.filter` expressions.
Fixed 'explode' mapping strategy on scalar value.
Fixed repeated `with_row_index()` after `scan()` silently ignored.
Correctly returned min and max for enums in groupby aggregation.
Fixed aggstate for `gather`.
Kept scalars for length preserving functions in `group_by`.
Fixed duplicate select panic.
Fixed inconsistency of list.sum() result type.

🔧 Affected Symbols

LazyFrame.{sink,collect}_batchesdt.truncategroup_by_dynamicApplyExprBinaryExprscan_ipcExpr.itemscan_parquetdt.total_{}name.replaceunnestunionrolling_rankarr.evallist.aggarr.aggjoin_asofpct_changewith_row_indexscan