py-1.30.0-beta.1

📅 May 16, 2025📦 polars

✨ 20 features🐛 37 fixes⚡ 1 deprecations🔧 18 symbols

Summary

This release focuses heavily on performance improvements, including optimized initialization and parallelism, alongside numerous bug fixes across SQL, I/O, and expression evaluation. New features include enhanced schema matching, improved time-string parsing, and better support for various data types and operations.

✨ New Features

Increase default cross-file parallelism limit for new-streaming multiscan.
Add elementwise execution mode for `list.eval`.
Support optimised init from non-dict `Mapping` objects in `from_records` and frame/series constructors.
Add streaming cross-join node.
Support binaryoffset in search sorted.
Add `nulls_equal` flag to `list/arr.contains`.
Implement `LazyFrame.match_to_schema`.
Improved time-string parsing and inference (generally, and via the SQL interface).
Allow for `.over` to be called without `partition_by`.
Support `AnyValue` translation from `PyMapping` values.
Support inference of `Int128` dtype from databases that support it.
Add options to write Parquet field metadata.
Add `cast_options` parameter to control type casting in `scan_parquet`.
Allow casting List<UInt8> to Binary.
Allow setting of regex size limit using `POLARS_REGEX_SIZE_LIMIT`.
Support use of literal values as "other" when evaluating `Series.zip_with`.
Allow to read and write custom file-level parquet metadata.
Support grouping by `pl.Array`.
Preserve exception type and traceback for errors raised from Python.
Use fixed-width font in streaming phys plan graph.

🐛 Bug Fixes

Respect BinaryOffset metadata.
Correct the output order of `PartitionByKey` and `PartitionParted`.
Fallback to non-strict casting for deprecated casts.
Handle sliced out remainder for bitmaps.
Don't merge `Enum` categories on append.
Fix unnest() not working on empty struct columns.
Fix the default value type in `Schema` init.
Correct name in `unnest` error message.
Provide "schema" to `DataFrame`, even if empty
Properly account for nulls in the `is_not_nan` check made in `drop_nans`.
Incorrect result from SQL `count(*)` with `partition by`.
Fix deadlock joining scanned tables with low thread count.
Don't allow deserializing incompatible DSL.
Incorrect null dtype from binary ops in empty group_by.
Don't mark `str.replace_many` with Mapping as deprecated.
Gzip has maximum compression of 9, not 10.
Fix predicate pushdown of fallible expressions.
Fix `index out of bounds` panic when scanning hugging face.
Panic on `group_by` with literal and empty rows.
Return input instead of panicking if empty subset in `drop_nulls()` and `drop_nans()`.
Bump argminmax to 0.6.3.
DSL version deserialization endianness.
Allow Expr.round() to be called on integer dtypes.
Fix panic when filtering based on row index column in parquet.
WASM and PyOdide compile.
Resolve `get()` SchemaMismatch panic.
Panic in group_by_dynamic on single-row df with group_by.
Add `new_streaming` feature to `polars` crate.
Consistently use Unix epoch as origin for ``dt.truncate`` (except weekly buckets which start on Mondays).
Fix interpolate on dtype Decimal.
CSV count rows skipped last line if file did not end with newline.
Make nested strict casting actually strict.
Make `replace` and `replace_strict` mapping use list literals.
Allow pivot on `Time` column.
Fix error when providing CSV schema with extra columns.
Panic on bitwise op between Series and Expr.
Multi-selector regex expansion.

🔧 Affected Symbols

list.evalfrom_recordsLazyFrame.match_to_schemalist/arr.containsSeries.zip_withscan_parquetgroup_bydrop_nansdrop_nullsExpr.rounddt.truncatestr.replace_manyunnestSchemadropfillinsert_columnjoin

⚡ Deprecations

Support PEP702 `@deprecated` decorator behaviour was added, implying potential changes related to how deprecation warnings are handled or implemented.