Change8

py-1.25.2

📦 polarsView on GitHub →
20 features🐛 38 fixes1 deprecations🔧 12 symbols

Summary

This release introduces significant performance enhancements, including common subplan elimination and linear-time rolling operations, alongside new features like lazy sinks and expanded support for streaming engines.

✨ New Features

  • Enable common subplan elimination across plans in collect_all
  • Add lazy sinks
  • Add PartitionByKey for new streaming sinks
  • Enable new streaming memory sinks by default
  • Add support for rolling_(sum/min/max) for booleans through casting
  • Support multi-column sort for all nested types and nested search-sorted
  • Add mkdir flag to sinks
  • Enable joins on list/array dtypes
  • Add a config option to specify the default engine to attempt to use during lazyframe calls
  • Support all elementwise functions in IO plugin predicates
  • Stabilize Enum datatype
  • Support Polars int128 in from arrow
  • Cloud support for new-streaming scans and sinks
  • Add len method to arr
  • Closeable files on unix
  • Add new PartitionMaxSize sink
  • Support engine callback for LazyFrame.profile
  • Dispatch new-streaming CSV negative slice to separate node
  • Add NDJSON source to new streaming engine
  • Support passing token in storage_options for GCP cloud

🐛 Bug Fixes

  • Expose and document partitions
  • Fix lazy schema for truediv ops involving List/Array dtypes
  • Fix error due to race condition in file cache
  • Clear NaNs due to zero-weight division in rolling var/std
  • Allow init from BigQuery Arrow data containing ExtensionType cols with irrelevant metadata
  • Disallow cast from boolean to categorical/enum
  • Don't check sortedness in join_asof when 'by' groups supplied, but issue warning
  • Incorrect multithread path taken for aggregations
  • Disallow cast to empty Enum
  • Fix list.mean and list.median returning Float64 for temporal types
  • Incorrect (FixedSize)ListArrayBuilder gather implementation
  • Always fallback in SkipBatchPredicate
  • New streaming multiscan deadlock
  • Ensure new-streaming join BuildState is correct even if never fed morsels
  • IO plugin; support empty iterator
  • Support nulls in multi-column sort
  • Window function check length of groups state
  • Support 128 sum reduction on new streaming
  • IPC round-trip of list of empty view with non-empty bufferset
  • Variance can never be negative
  • Incorrect loop length in new-streaming group by
  • Right join on multiple columns not coalescing left_on columns
  • Casting Struct to String panics if n_chunks > 1
  • Fix Future attached to different loop error on read_database_uri
  • Fix deadlock in cache + hconcat
  • Properly handle phase transitions in row-wise sinks
  • Always use global registry for object
  • Check enum categories when reading csv
  • Unspecialized prefiltering on nullable arrays
  • Release the gil on explain
  • Take into account scalar/partitioned columns in DataFrame::split_chunks
  • Bad null handling in unordered row encoding
  • Fix deadlock in new streaming CSV / NDJSON sinks
  • Bad view index in BinaryViewBuilder
  • Fix CSV count with comment prefix skipped empty lines
  • New streaming IPC enum scan
  • Several aspects related to ParquetColumnExpr
  • Don't hit parquet::pre-filtered in case of pre-slice

🔧 Affected Symbols

collect_allrolling_min/maxInputIndependentSelectInMemorySourceNoderolling_(sum/min/max)PartitionByKeyjoin_asoflist.meanlist.medianLazyFrame.profileread_database_uriDataFrame::split_chunks

⚡ Deprecations

  • Remove old-streaming from engine argument