Change8

py-1.30.0

📦 polarsView on GitHub →
20 features🐛 44 fixes🔧 20 symbols

Summary

This release focuses heavily on performance improvements across various operations, including optimizer casts, parallelism, and elementwise execution. It introduces several new features like list.filter, LazyFrame.match_to_schema, and enhanced type inference, alongside numerous bug fixes addressing panics and incorrect outputs.

✨ New Features

  • Implemented list.filter.
  • Support binaryoffset in search sorted.
  • Add nulls_equal flag to list/arr.contains.
  • Implement LazyFrame.match_to_schema.
  • Improved time-string parsing and inference (generally, and via the SQL interface).
  • Allow for .over to be called without partition_by.
  • Support AnyValue translation from PyMapping values.
  • Support optimised init from non-dict Mapping objects in from_records and frame/series constructors.
  • Support inference of Int128 dtype from databases that support it.
  • Add options to write Parquet field metadata.
  • Add cast_options parameter to control type casting in scan_parquet.
  • Allow casting List<UInt8> to Binary.
  • Allow setting of regex size limit using POLARS_REGEX_SIZE_LIMIT.
  • Support use of literal values as "other" when evaluating Series.zip_with.
  • Allow to read and write custom file-level parquet metadata.
  • Support PEP702 @deprecated decorator behaviour.
  • Support grouping by pl.Array.
  • Preserve exception type and traceback for errors raised from Python.
  • Use fixed-width font in streaming phys plan graph.
  • Load AWS endpoint_url using boto3.

🐛 Bug Fixes

  • Fix RuntimeError when serializing the same DataFrame from multiple threads.
  • Fix map_elements predicate pushdown.
  • Fix reverse list type.
  • Don't require numpy for search_sorted.
  • Add type equality checking for relevant methods.
  • Invalid output for fill_null after when.then on structs.
  • Don't panic for cross join with misaligned chunking.
  • Panic on quantile over nulls in rolling window.
  • Respect BinaryOffset metadata.
  • Correct the output order of PartitionByKey and PartitionParted.
  • Fallback to non-strict casting for deprecated casts.
  • Handle sliced out remainder for bitmaps.
  • Don't merge Enum categories on append.
  • Fix unnest() not working on empty struct columns.
  • Fix the default value type in Schema init.
  • Correct name in unnest error message.
  • Provide "schema" to DataFrame, even if empty
  • Properly account for nulls in the is_not_nan check made in drop_nans.
  • Incorrect result from SQL count(*) with partition by.
  • Fix deadlock joining scanned tables with low thread count.
  • Don't allow deserializing incompatible DSL.
  • Incorrect null dtype from binary ops in empty group_by.
  • Don't mark str.replace_many with Mapping as deprecated.
  • Gzip has maximum compression of 9, not 10.
  • Fix predicate pushdown of fallible expressions.
  • Fix index out of bounds panic when scanning hugging face.
  • Panic on group_by with literal and empty rows.
  • Return input instead of panicking if empty subset in drop_nulls() and drop_nans().
  • Bump argminmax to 0.6.3.
  • DSL version deserialization endianness.
  • Allow Expr.round() to be called on integer dtypes.
  • Fix panic when filtering based on row index column in parquet.
  • WASM and PyOdide compile.
  • Resolve get() SchemaMismatch panic.
  • Panic in group_by_dynamic on single-row df with group_by.
  • Consistently use Unix epoch as origin for dt.truncate (except weekly buckets which start on Mondays).
  • Fix interpolate on dtype Decimal.
  • CSV count rows skipped last line if file did not end with newline.
  • Make nested strict casting actually strict.
  • Make replace and replace_strict mapping use list literals.
  • Allow pivot on Time column.
  • Fix error when providing CSV schema with extra columns.
  • Panic on bitwise op between Series and Expr.
  • Multi-selector regex expansion.

🔧 Affected Symbols

list.evalfrom_recordsscan_parquetlist.containsLazyFrame.match_to_schemadt.truncatestr.replace_manygroup_bydrop_nulls()drop_nans()Expr.round()unnest()Schemafill_nullPartitionByKeyPartitionPartedsink_*list.getinsert_columnjoin