py-1.34.0-beta.4
📦 polars
✨ 14 features🐛 33 fixes🔧 13 symbols
Summary
This release introduces new batch collection methods (`LazyFrame.{sink,collect}_batches`), significant performance optimizations across scanning and expressions, and numerous bug fixes, especially around Iceberg and streaming operations.
✨ New Features
- Added LazyFrame.{sink,collect}_batches methods.
- Ensured deterministic import order for Python Polars package variants.
- Support scanning from file:/path URIs.
- Log which file the schema was sourced from, and which file caused an extra column error.
- Added support to display lazy query plan in marimo notebooks without needing to install matplotlib or mermaid.
- Added unstable `hidden_file_prefix` parameter to `scan_parquet`.
- Use fixed-scale Decimals.
- Added support for unsigned 128-bit integers.
- Added unstable `pl.Config.set_default_credential_provider`.
- Roundtrip `BinaryOffset` type through Parquet.
- Added opt-in unstable functionality to load interval types as `Struct`.
- Support reading parquet metadata from cloud storage.
- Added user guide section on AWS role assumption.
- Support `unique` / `n_unique` / `arg_unique` for `array` columns.
🐛 Bug Fixes
- Widen `from_dicts` to `Iterable[Mapping[str, Any]]`.
- Fix `unsupported arrow type Dictionary` error in `scan_iceberg()`.
- Raise Exception instead of panic when unnest on non-struct column.
- Include missing feature dependency from `polars-stream/diff` to `polars-plan/abs`.
- Fix newline escaping in streaming show_graph.
- Do not allow inferring (-1) the dimension on any `Expr.reshape` dimension except the first.
- Sink batches early stop on in-memory engine.
- More precisely model expression ordering requirements.
- Fix panic in zero-weight rolling mean/var.
- Fix Decimal <-> literal arithmetic supertype rules.
- Match various aggregation return types in the streaming engine with the in-memory engine.
- Validate list type for list expressions in planner.
- Fix `scan_iceberg()` storage options not taking effect.
- Have `log()` prioritize the leftmost dtype for its output dtype.
- CSV pl.len() was incorrect.
- Add support for float inputs for duration types.
- Roundtrip empty string through hive partitioning.
- Fix potential OOB writes in unaligned IPC read.
- Fix regression error when scanning AWS presigned URL.
- Make `PlPath::join` for cloud paths replace on absolute paths.
- Correct dtype for cum_agg in streaming engine.
- Restore support for np.datetime64() in pl.lit().
- Ignore Iceberg list element ID if missing.
- Fix panic on streaming full join with coalesce.
- Fix `AggState` on `all_literal` in `BinaryExpr`.
- Show IR sort options in `explain`.
- Fix schema on `ApplyExpr` with single row `literal` in agg context.
- Fix planner schema for dividing `pl.Float32` by int.
- Fix panic scanning from AWS legacy global endpoint URL.
- Fix `iterable_to_pydf(..., infer_schema_length=None)` to scan all data.
- Do not propagate struct of nulls with null.
- Be stricter with invalid NDJSON input when `ignore_errors=False`.
- Implement `approx_n_unique` for temporal dtypes and Null.
🔧 Affected Symbols
LazyFrame.{sink,collect}_batchesscan_icebergExpr.reshapepl.len()pl.lit()AggStateBinaryExprexplainApplyExprpl.Float32iterable_to_pydfpl.Config.set_default_credential_providerscan_parquet