py-1.34.0-beta.5
📦 polars
✨ 17 features🐛 42 fixes🔧 19 symbols
Summary
This release introduces new lazy sink/collect batch methods, enhances performance across various scan and expression operations, and fixes numerous bugs related to data types, streaming, and cloud storage interactions.
✨ New Features
- Add LazyFrame.{sink,collect}_batches
- Deterministic import order for Python Polars package variants
- Implement maintain_order for cross join
- Add support to output dt.total_{}() duration values as fractionals
- Avoid forcing a pyarrow dependency in read_excel when using the default "calamine" engine
- Support scanning from file:/path URIs
- Log which file the schema was sourced from, and which file caused an extra column error
- Add support to display lazy query plan in marimo notebooks without needing to install matplotlib or mermaid
- Add unstable hidden_file_prefix parameter to scan_parquet
- Use fixed-scale Decimals
- Add support for unsigned 128-bit integers
- Add unstable pl.Config.set_default_credential_provider
- Roundtrip BinaryOffset type through Parquet
- Add opt-in unstable functionality to load interval types as Struct
- Support reading parquet metadata from cloud storage
- Add user guide section on AWS role assumption
- Support unique / n_unique / arg_unique for array columns
🐛 Bug Fixes
- Make Categories pickleable
- Shift on array within list
- Fix handling of AggregatedScalar in ApplyExpr single input
- Support reading of mixed compressed/uncompressed IPC buffers
- Overflow in slice-slice optimization
- Package discovery for setuptools
- Add type assertion to prevent out-of-bounds in GenericFirstLastGroupedReduction
- Remove inclusion of polars dir in runtime sdist/wheel
- Method dt.month_end was unnecessarily raising when the month-start timestamp was ambiguous
- Widen from_dicts to Iterable[Mapping[str, Any]]
- Fix unsupported arrow type Dictionary error in scan_iceberg()
- Raise Exception instead of panic when unnest on non-struct column
- Include missing feature dependency from polars-stream/diff to polars-plan/abs
- Newline escaping in streaming show_graph
- Do not allow inferring (-1) the dimension on any Expr.reshape dimension except the first
- Sink batches early stop on in-memory engine
- More precisely model expression ordering requirements
- Panic in zero-weight rolling mean/var
- Decimal <-> literal arithmetic supertype rules
- Match various aggregation return types in the streaming engine with the in-memory engine
- Validate list type for list expressions in planner
- Fix scan_iceberg() storage options not taking effect
- Have log() prioritize the leftmost dtype for its output dtype
- CSV pl.len() was incorrect
- Add support for float inputs for duration types
- Roundtrip empty string through hive partitioning
- Fix potential OOB writes in unaligned IPC read
- Fix regression error when scanning AWS presigned URL
- Make PlPath::join for cloud paths replace on absolute paths
- Correct dtype for cum_agg in streaming engine
- Restore support for np.datetime64() in pl.lit()
- Ignore Iceberg list element ID if missing
- Fix panic on streaming full join with coalesce
- Fix AggState on all_literal in BinaryExpr
- Show IR sort options in explain
- Fix schema on ApplyExpr with single row literal in agg context
- Fix planner schema for dividing pl.Float32 by int
- Fix panic scanning from AWS legacy global endpoint URL
- Fix iterable_to_pydf(..., infer_schema_length=None) to scan all data
- Do not propagate struct of nulls with null
- Be stricter with invalid NDJSON input when ignore_errors=False
- Implement approx_n_unique for temporal dtypes and Null
🔧 Affected Symbols
LazyFrame.{sink,collect}_batchesdt.total_{}read_excelscan_parquetpl.Config.set_default_credential_providerCategoriesAggregatedScalarApplyExprscan_iceberg()dt.month_endfrom_dictsExpr.reshapelog()pl.len()PlPath::joinpl.lit()BinaryExprexplainiterable_to_pydf