Change8

py-1.34.0-beta.4

📦 polars
14 features🐛 33 fixes🔧 13 symbols

Summary

This release introduces new batch collection methods (`LazyFrame.{sink,collect}_batches`), significant performance optimizations across scanning and expressions, and numerous bug fixes, especially around Iceberg and streaming operations.

✨ New Features

  • Added LazyFrame.{sink,collect}_batches methods.
  • Ensured deterministic import order for Python Polars package variants.
  • Support scanning from file:/path URIs.
  • Log which file the schema was sourced from, and which file caused an extra column error.
  • Added support to display lazy query plan in marimo notebooks without needing to install matplotlib or mermaid.
  • Added unstable `hidden_file_prefix` parameter to `scan_parquet`.
  • Use fixed-scale Decimals.
  • Added support for unsigned 128-bit integers.
  • Added unstable `pl.Config.set_default_credential_provider`.
  • Roundtrip `BinaryOffset` type through Parquet.
  • Added opt-in unstable functionality to load interval types as `Struct`.
  • Support reading parquet metadata from cloud storage.
  • Added user guide section on AWS role assumption.
  • Support `unique` / `n_unique` / `arg_unique` for `array` columns.

🐛 Bug Fixes

  • Widen `from_dicts` to `Iterable[Mapping[str, Any]]`.
  • Fix `unsupported arrow type Dictionary` error in `scan_iceberg()`.
  • Raise Exception instead of panic when unnest on non-struct column.
  • Include missing feature dependency from `polars-stream/diff` to `polars-plan/abs`.
  • Fix newline escaping in streaming show_graph.
  • Do not allow inferring (-1) the dimension on any `Expr.reshape` dimension except the first.
  • Sink batches early stop on in-memory engine.
  • More precisely model expression ordering requirements.
  • Fix panic in zero-weight rolling mean/var.
  • Fix Decimal <-> literal arithmetic supertype rules.
  • Match various aggregation return types in the streaming engine with the in-memory engine.
  • Validate list type for list expressions in planner.
  • Fix `scan_iceberg()` storage options not taking effect.
  • Have `log()` prioritize the leftmost dtype for its output dtype.
  • CSV pl.len() was incorrect.
  • Add support for float inputs for duration types.
  • Roundtrip empty string through hive partitioning.
  • Fix potential OOB writes in unaligned IPC read.
  • Fix regression error when scanning AWS presigned URL.
  • Make `PlPath::join` for cloud paths replace on absolute paths.
  • Correct dtype for cum_agg in streaming engine.
  • Restore support for np.datetime64() in pl.lit().
  • Ignore Iceberg list element ID if missing.
  • Fix panic on streaming full join with coalesce.
  • Fix `AggState` on `all_literal` in `BinaryExpr`.
  • Show IR sort options in `explain`.
  • Fix schema on `ApplyExpr` with single row `literal` in agg context.
  • Fix planner schema for dividing `pl.Float32` by int.
  • Fix panic scanning from AWS legacy global endpoint URL.
  • Fix `iterable_to_pydf(..., infer_schema_length=None)` to scan all data.
  • Do not propagate struct of nulls with null.
  • Be stricter with invalid NDJSON input when `ignore_errors=False`.
  • Implement `approx_n_unique` for temporal dtypes and Null.

🔧 Affected Symbols

LazyFrame.{sink,collect}_batchesscan_icebergExpr.reshapepl.len()pl.lit()AggStateBinaryExprexplainApplyExprpl.Float32iterable_to_pydfpl.Config.set_default_credential_providerscan_parquet