Change8

rs-0.52.0

📦 polarsView on GitHub →
38 features🐛 42 fixes🔧 20 symbols

Summary

This release focuses heavily on performance improvements across lazy evaluation, group-by operations, and I/O, alongside numerous bug fixes, especially in SQL handling and streaming aggregations. New features include batch collection methods for LazyFrames and enhanced streaming support for various functions.

✨ New Features

  • Added `LazyFrame.{sink,collect}_batches` methods.
  • Implemented deterministic import order for Python Polars package variants.
  • Added support for `ewm_var/std` in the streaming engine.
  • Made DSL-hash skippable.
  • Implemented streaming `{Expr,LazyFrame}.rolling`.
  • Set polars/\<version> user-agent.
  • Added `BIT_NOT` support to the SQL interface.
  • Supported BYTE_ARRAY backed Decimals in Parquet.
  • Added `allow_empty` flag to `item`.
  • Supported `ewm_mean()` in the streaming engine.
  • Improved row-count estimates.
  • Removed filtered scan paths in IR when possible.
  • Introduced remote Polars MCP server.
  • Allowed local scans on polars cloud (configurable).
  • Added `Expr.item` to strictly extract a single value from an expression.
  • Added environment variable to roundtrip empty struct in Parquet.
  • Added `glob` parameter to `scan_ipc`.
  • Added `list.agg` and `arr.agg`.
  • Implemented `{Expr,Series}.rolling_rank()`.
  • Supported MergeSorted in CSPE.
  • Recursively applied CSPE.
  • Added streaming engine per-node metrics.
  • Added `arr.eval`.
  • Added `nth_set_bit_u64()` with unit test.
  • Added `separator` to `{Data,Lazy}Frame.unnest`.
  • Added `union()` function for unordered concatenation.
  • Added `name.replace` to the set of column rename options.
  • Allowed duration strings with leading "+".
  • Dropped now-unnecessary post-init "schema_overrides" cast on `DataFrame` load from list of dicts.
  • Added support for UInt128 to pyo3-polars.
  • Implemented maintain_order for cross join.
  • Added support to output `dt.total_{}()` duration values as fractionals.
  • Supported scanning from `file:/path` URIs.
  • Logged which file the schema was sourced from, and which file caused an extra column error.
  • Added support to display lazy query plan in marimo notebooks without needing to install matplotlib or mermaid.
  • Added unstable `hidden_file_prefix` parameter to `scan_parquet`.
  • Used fixed-scale Decimals.
  • Added support for unsigned 128-bit integers.

🐛 Bug Fixes

  • Fixed CSV `select(len())` off by 1 with comment prefix.
  • Fixed incorrect reshape on sliced lists.
  • Supported "index" as column name in `group_by` iterator.
  • Ensured DSL_SCHEMA_HASH does not change by line endings.
  • Solved multiple issues relating to arena mutation in SQL subqueries.
  • Fixed panic in `dt.truncate` for invalid duration strings.
  • Stopped triggering `DeprecationWarning` from SQL "IN" constraints that use subqueries.
  • Returned the correct string-case `Expr` reprs.
  • Fixed `groups` update on slices with different offsets.
  • Fixed handling `Null` dtype in `ApplyExpr` on `group_by`.
  • Raised error for all/any on list instead of panic.
  • Fixed unique key names in streaming sort/top_k.
  • Ensured the `SQL` interface uses logical, not bitwise, behaviour for unary "NOT" operator.
  • Fixed panic if scan predicate produces 0 length mask.
  • Ensured SQL table alias resolution checks against CTE aliases on fallback.
  • Fixed panic in `group_by_dynamic` with `group_by` and multiple chunks.
  • Fixed panic when using struct field as join key.
  • Allowed broadcast in `group_by` for `ApplyExpr` and `BinaryExpr`.
  • Fixed field metadata for nested categorical PyCapsule export.
  • Blocked predicate pushdown when `group_by` key values are changed.
  • Fixed Group-By aggregation problems caused by `AmortSeries`.
  • Stopped pushing down predicates passed inserted cache nodes.
  • Allowed for negative time in `group_by_dynamic` iterator.
  • Re-enabled CPU feature check before import.
  • Corrected `any(ignore_nulls)` and OOB in `all`.
  • Fixed streaming any/all with ignore_nulls=False.
  • Fixed incorrect `join_asof` on a casted expression.
  • Fixed capitalisation of letters after numbers in to_titlecase.
  • Preserved null values in `pct_change`.
  • Raised length mismatch on `over` with sliced groups.
  • Checked duplicate name in transpose.
  • Followed Kleene logic in `any` / `all` for group-by.
  • Did not optimize cross join to iejoin if order maintaining.
  • Broadcasted `partition_by` columns in `over` expression.
  • Cleared index cache on stacked `df.filter` expressions.
  • Fixed 'explode' mapping strategy on scalar value.
  • Fixed repeated `with_row_index()` after `scan()` silently ignored.
  • Correctly returned min and max for enums in groupby aggregation.
  • Fixed aggstate for `gather`.
  • Kept scalars for length preserving functions in `group_by`.
  • Fixed duplicate select panic.
  • Fixed inconsistency of list.sum() result type.

🔧 Affected Symbols

LazyFrame.{sink,collect}_batchesdt.truncategroup_by_dynamicApplyExprBinaryExprscan_ipcExpr.itemscan_parquetdt.total_{}name.replaceunnestunionrolling_rankarr.evallist.aggarr.aggjoin_asofpct_changewith_row_indexscan