py-1.30.0-beta.1
📦 polars
✨ 20 features🐛 37 fixes⚡ 1 deprecations🔧 18 symbols
Summary
This release focuses heavily on performance improvements, including optimized initialization and parallelism, alongside numerous bug fixes across SQL, I/O, and expression evaluation. New features include enhanced schema matching, improved time-string parsing, and better support for various data types and operations.
✨ New Features
- Increase default cross-file parallelism limit for new-streaming multiscan.
- Add elementwise execution mode for `list.eval`.
- Support optimised init from non-dict `Mapping` objects in `from_records` and frame/series constructors.
- Add streaming cross-join node.
- Support binaryoffset in search sorted.
- Add `nulls_equal` flag to `list/arr.contains`.
- Implement `LazyFrame.match_to_schema`.
- Improved time-string parsing and inference (generally, and via the SQL interface).
- Allow for `.over` to be called without `partition_by`.
- Support `AnyValue` translation from `PyMapping` values.
- Support inference of `Int128` dtype from databases that support it.
- Add options to write Parquet field metadata.
- Add `cast_options` parameter to control type casting in `scan_parquet`.
- Allow casting List<UInt8> to Binary.
- Allow setting of regex size limit using `POLARS_REGEX_SIZE_LIMIT`.
- Support use of literal values as "other" when evaluating `Series.zip_with`.
- Allow to read and write custom file-level parquet metadata.
- Support grouping by `pl.Array`.
- Preserve exception type and traceback for errors raised from Python.
- Use fixed-width font in streaming phys plan graph.
🐛 Bug Fixes
- Respect BinaryOffset metadata.
- Correct the output order of `PartitionByKey` and `PartitionParted`.
- Fallback to non-strict casting for deprecated casts.
- Handle sliced out remainder for bitmaps.
- Don't merge `Enum` categories on append.
- Fix unnest() not working on empty struct columns.
- Fix the default value type in `Schema` init.
- Correct name in `unnest` error message.
- Provide "schema" to `DataFrame`, even if empty
- Properly account for nulls in the `is_not_nan` check made in `drop_nans`.
- Incorrect result from SQL `count(*)` with `partition by`.
- Fix deadlock joining scanned tables with low thread count.
- Don't allow deserializing incompatible DSL.
- Incorrect null dtype from binary ops in empty group_by.
- Don't mark `str.replace_many` with Mapping as deprecated.
- Gzip has maximum compression of 9, not 10.
- Fix predicate pushdown of fallible expressions.
- Fix `index out of bounds` panic when scanning hugging face.
- Panic on `group_by` with literal and empty rows.
- Return input instead of panicking if empty subset in `drop_nulls()` and `drop_nans()`.
- Bump argminmax to 0.6.3.
- DSL version deserialization endianness.
- Allow Expr.round() to be called on integer dtypes.
- Fix panic when filtering based on row index column in parquet.
- WASM and PyOdide compile.
- Resolve `get()` SchemaMismatch panic.
- Panic in group_by_dynamic on single-row df with group_by.
- Add `new_streaming` feature to `polars` crate.
- Consistently use Unix epoch as origin for ``dt.truncate`` (except weekly buckets which start on Mondays).
- Fix interpolate on dtype Decimal.
- CSV count rows skipped last line if file did not end with newline.
- Make nested strict casting actually strict.
- Make `replace` and `replace_strict` mapping use list literals.
- Allow pivot on `Time` column.
- Fix error when providing CSV schema with extra columns.
- Panic on bitwise op between Series and Expr.
- Multi-selector regex expansion.
🔧 Affected Symbols
list.evalfrom_recordsLazyFrame.match_to_schemalist/arr.containsSeries.zip_withscan_parquetgroup_bydrop_nansdrop_nullsExpr.rounddt.truncatestr.replace_manyunnestSchemadropfillinsert_columnjoin⚡ Deprecations
- Support PEP702 `@deprecated` decorator behaviour was added, implying potential changes related to how deprecation warnings are handled or implemented.