py-1.34.0-beta.5

📅 Oct 1, 2025📦 polars

✨ 17 features🐛 42 fixes🔧 19 symbols

Summary

This release introduces new lazy sink/collect batch methods, enhances performance across various scan and expression operations, and fixes numerous bugs related to data types, streaming, and cloud storage interactions.

✨ New Features

Add LazyFrame.{sink,collect}_batches
Deterministic import order for Python Polars package variants
Implement maintain_order for cross join
Add support to output dt.total_{}() duration values as fractionals
Avoid forcing a pyarrow dependency in read_excel when using the default "calamine" engine
Support scanning from file:/path URIs
Log which file the schema was sourced from, and which file caused an extra column error
Add support to display lazy query plan in marimo notebooks without needing to install matplotlib or mermaid
Add unstable hidden_file_prefix parameter to scan_parquet
Use fixed-scale Decimals
Add support for unsigned 128-bit integers
Add unstable pl.Config.set_default_credential_provider
Roundtrip BinaryOffset type through Parquet
Add opt-in unstable functionality to load interval types as Struct
Support reading parquet metadata from cloud storage
Add user guide section on AWS role assumption
Support unique / n_unique / arg_unique for array columns

🐛 Bug Fixes

Make Categories pickleable
Shift on array within list
Fix handling of AggregatedScalar in ApplyExpr single input
Support reading of mixed compressed/uncompressed IPC buffers
Overflow in slice-slice optimization
Package discovery for setuptools
Add type assertion to prevent out-of-bounds in GenericFirstLastGroupedReduction
Remove inclusion of polars dir in runtime sdist/wheel
Method dt.month_end was unnecessarily raising when the month-start timestamp was ambiguous
Widen from_dicts to Iterable[Mapping[str, Any]]
Fix unsupported arrow type Dictionary error in scan_iceberg()
Raise Exception instead of panic when unnest on non-struct column
Include missing feature dependency from polars-stream/diff to polars-plan/abs
Newline escaping in streaming show_graph
Do not allow inferring (-1) the dimension on any Expr.reshape dimension except the first
Sink batches early stop on in-memory engine
More precisely model expression ordering requirements
Panic in zero-weight rolling mean/var
Decimal <-> literal arithmetic supertype rules
Match various aggregation return types in the streaming engine with the in-memory engine
Validate list type for list expressions in planner
Fix scan_iceberg() storage options not taking effect
Have log() prioritize the leftmost dtype for its output dtype
CSV pl.len() was incorrect
Add support for float inputs for duration types
Roundtrip empty string through hive partitioning
Fix potential OOB writes in unaligned IPC read
Fix regression error when scanning AWS presigned URL
Make PlPath::join for cloud paths replace on absolute paths
Correct dtype for cum_agg in streaming engine
Restore support for np.datetime64() in pl.lit()
Ignore Iceberg list element ID if missing
Fix panic on streaming full join with coalesce
Fix AggState on all_literal in BinaryExpr
Show IR sort options in explain
Fix schema on ApplyExpr with single row literal in agg context
Fix planner schema for dividing pl.Float32 by int
Fix panic scanning from AWS legacy global endpoint URL
Fix iterable_to_pydf(..., infer_schema_length=None) to scan all data
Do not propagate struct of nulls with null
Be stricter with invalid NDJSON input when ignore_errors=False
Implement approx_n_unique for temporal dtypes and Null

🔧 Affected Symbols

LazyFrame.{sink,collect}_batchesdt.total_{}read_excelscan_parquetpl.Config.set_default_credential_providerCategoriesAggregatedScalarApplyExprscan_iceberg()dt.month_endfrom_dictsExpr.reshapelog()pl.len()PlPath::joinpl.lit()BinaryExprexplainiterable_to_pydf