py-1.34.0-beta.3
📦 polars
✨ 14 features🐛 33 fixes🔧 17 symbols
Summary
This release introduces new batch collection methods for LazyFrames and enhances performance across various scan types, alongside numerous bug fixes for stability and correctness in streaming and aggregation operations.
✨ New Features
- Added LazyFrame.{sink,collect}_batches methods.
- Implemented deterministic import order for Python Polars package variants.
- Added support for scanning from file:/path URIs.
- Logging now indicates which file the schema was sourced from, and which file caused an extra column error.
- Added support to display lazy query plan in marimo notebooks without needing matplotlib or mermaid.
- Added unstable `hidden_file_prefix` parameter to `scan_parquet`.
- Switched to using fixed-scale Decimals.
- Added support for unsigned 128-bit integers.
- Added unstable `pl.Config.set_default_credential_provider`.
- Enabled roundtrip of `BinaryOffset` type through Parquet.
- Added opt-in unstable functionality to load interval types as `Struct`.
- Added support for reading parquet metadata from cloud storage.
- Added user guide section on AWS role assumption.
- Added support for `unique` / `n_unique` / `arg_unique` for `array` columns.
🐛 Bug Fixes
- Widened `from_dicts` input type to `Iterable[Mapping[str, Any]]`.
- Fixed `unsupported arrow type Dictionary` error in `scan_iceberg()`.
- Now raises an Exception instead of panicking when unnesting on a non-struct column.
- Included missing feature dependency from `polars-stream/diff` to `polars-plan/abs`.
- Fixed newline escaping in streaming show_graph.
- Prevented inferring dimension (`-1`) on any `Expr.reshape` dimension except the first.
- Sink batches now stop early on the in-memory engine.
- More precisely modeled expression ordering requirements.
- Fixed panic in zero-weight rolling mean/var.
- Corrected Decimal <-> literal arithmetic supertype rules.
- Matched various aggregation return types in the streaming engine with the in-memory engine.
- Validated list type for list expressions in the planner.
- Fixed `scan_iceberg()` storage options not taking effect.
- Made `log()` prioritize the leftmost dtype for its output dtype.
- CSV `pl.len()` calculation was incorrect.
- Added support for float inputs for duration types.
- Fixed roundtrip of empty string through hive partitioning.
- Fixed potential OOB writes in unaligned IPC read.
- Fixed regression error when scanning AWS presigned URL.
- Made `PlPath::join` for cloud paths replace on absolute paths.
- Corrected dtype for cum_agg in the streaming engine.
- Restored support for `np.datetime64()` in `pl.lit()`.
- Ignored Iceberg list element ID if missing.
- Fixed panic on streaming full join with coalesce.
- Fixed `AggState` on `all_literal` in `BinaryExpr`.
- Show IR sort options in `explain`.
- Fixed benchmark CI import.
- Fixed planner schema for dividing `pl.Float32` by int.
- Fixed panic scanning from AWS legacy global endpoint URL.
- Fixed `iterable_to_pydf(..., infer_schema_length=None)` to scan all data.
- Stopped propagating struct of nulls with null.
- Became stricter with invalid NDJSON input when `ignore_errors=False`.
- Implemented `approx_n_unique` for temporal dtypes and Null.
🔧 Affected Symbols
LazyFrame.{sink,collect}_batchesscan_icebergpl.Config.set_default_credential_providerscan_parquetpl.litExpr.reshapelog()pl.len()PlPath::joincum_aggnp.datetime64()AggStateBinaryExprexplainApplyExpriterable_to_pydfapprox_n_unique