Change8

py-1.30.0-beta.1

📦 polars
20 features🐛 37 fixes1 deprecations🔧 18 symbols

Summary

This release focuses heavily on performance improvements, including optimized initialization and parallelism, alongside numerous bug fixes across SQL, I/O, and expression evaluation. New features include enhanced schema matching, improved time-string parsing, and better support for various data types and operations.

✨ New Features

  • Increase default cross-file parallelism limit for new-streaming multiscan.
  • Add elementwise execution mode for `list.eval`.
  • Support optimised init from non-dict `Mapping` objects in `from_records` and frame/series constructors.
  • Add streaming cross-join node.
  • Support binaryoffset in search sorted.
  • Add `nulls_equal` flag to `list/arr.contains`.
  • Implement `LazyFrame.match_to_schema`.
  • Improved time-string parsing and inference (generally, and via the SQL interface).
  • Allow for `.over` to be called without `partition_by`.
  • Support `AnyValue` translation from `PyMapping` values.
  • Support inference of `Int128` dtype from databases that support it.
  • Add options to write Parquet field metadata.
  • Add `cast_options` parameter to control type casting in `scan_parquet`.
  • Allow casting List<UInt8> to Binary.
  • Allow setting of regex size limit using `POLARS_REGEX_SIZE_LIMIT`.
  • Support use of literal values as "other" when evaluating `Series.zip_with`.
  • Allow to read and write custom file-level parquet metadata.
  • Support grouping by `pl.Array`.
  • Preserve exception type and traceback for errors raised from Python.
  • Use fixed-width font in streaming phys plan graph.

🐛 Bug Fixes

  • Respect BinaryOffset metadata.
  • Correct the output order of `PartitionByKey` and `PartitionParted`.
  • Fallback to non-strict casting for deprecated casts.
  • Handle sliced out remainder for bitmaps.
  • Don't merge `Enum` categories on append.
  • Fix unnest() not working on empty struct columns.
  • Fix the default value type in `Schema` init.
  • Correct name in `unnest` error message.
  • Provide "schema" to `DataFrame`, even if empty
  • Properly account for nulls in the `is_not_nan` check made in `drop_nans`.
  • Incorrect result from SQL `count(*)` with `partition by`.
  • Fix deadlock joining scanned tables with low thread count.
  • Don't allow deserializing incompatible DSL.
  • Incorrect null dtype from binary ops in empty group_by.
  • Don't mark `str.replace_many` with Mapping as deprecated.
  • Gzip has maximum compression of 9, not 10.
  • Fix predicate pushdown of fallible expressions.
  • Fix `index out of bounds` panic when scanning hugging face.
  • Panic on `group_by` with literal and empty rows.
  • Return input instead of panicking if empty subset in `drop_nulls()` and `drop_nans()`.
  • Bump argminmax to 0.6.3.
  • DSL version deserialization endianness.
  • Allow Expr.round() to be called on integer dtypes.
  • Fix panic when filtering based on row index column in parquet.
  • WASM and PyOdide compile.
  • Resolve `get()` SchemaMismatch panic.
  • Panic in group_by_dynamic on single-row df with group_by.
  • Add `new_streaming` feature to `polars` crate.
  • Consistently use Unix epoch as origin for ``dt.truncate`` (except weekly buckets which start on Mondays).
  • Fix interpolate on dtype Decimal.
  • CSV count rows skipped last line if file did not end with newline.
  • Make nested strict casting actually strict.
  • Make `replace` and `replace_strict` mapping use list literals.
  • Allow pivot on `Time` column.
  • Fix error when providing CSV schema with extra columns.
  • Panic on bitwise op between Series and Expr.
  • Multi-selector regex expansion.

🔧 Affected Symbols

list.evalfrom_recordsLazyFrame.match_to_schemalist/arr.containsSeries.zip_withscan_parquetgroup_bydrop_nansdrop_nullsExpr.rounddt.truncatestr.replace_manyunnestSchemadropfillinsert_columnjoin

⚡ Deprecations

  • Support PEP702 `@deprecated` decorator behaviour was added, implying potential changes related to how deprecation warnings are handled or implemented.