py-1.38.0
📦 polarsView on GitHub →
✨ 27 features🐛 40 fixes⚡ 1 deprecations🔧 38 symbols
Summary
This release focuses heavily on performance improvements across streaming, I/O, and core computations, alongside numerous bug fixes for stability and correctness. A key change is the deprecation of the `retries` argument in favor of using `storage_options`.
Migration Steps
- Replace usage of `retries=n` with `storage_options={"max_retries": n}`.
✨ New Features
- Enable zero-copy object_store `put` upload for IPC sink.
- Resolve file schema's and metadata concurrently.
- Run elementwise CSEE for the streaming engine.
- Disable morsel splitting for fast-count on streaming engine.
- Implement streaming decompression for scan_ndjson and scan_lines.
- Add dedicated kernel for group-by `arg_max/arg_min`.
- Add streaming merge-join.
- Generalize Bitmap::new_zeroed opt for Buffer::zeroed.
- Avoid OOM for scan_ndjson and scan_lines if input is compressed and negative slice.
- Support annoymous agg in-mem.
- Add unstable `arrow_schema` parameter to `sink_parquet`.
- Expose `upload_concurrency` through env var.
- Allow quantile to compute multiple quantiles at once.
- Allow empty LazyFrame in `LazyFrame.group_by(...).map_groups`.
- Use delta file statistics for batch predicate pushdown.
- Add streaming UnorderedUnion.
- Implement compression support for sink_ndjson.
- Add unstable record batch statistics flags to `{sink/scan}_ipc`.
- Support CSE for python UDFs on the same address.
- Cloud retry/backoff configuration via `storage_options`.
- Add compression support to write_csv and sink_csv.
- Add `scan_lines`.
- Support regex in `str.split`.
- Add unstable IPC Statistics read/write to `scan_ipc`/`sink_ipc`.
- Add unstable `height` parameter to `DataFrame`/`LazyFrame`.
- Expose ArrowStreamExportable on python collect batches iterator.
- Add nulls support for all rolling_by operations.
🐛 Bug Fixes
- Correct off-by-one in RLE row counting for nullable dictionary-encoded columns.
- Support very large integers in env var limits.
- Fix PlPath panic from incorrect slicing of UTF8 boundaries.
- Fix Float dtype for spearman correlation.
- Fix optimizer panic in right joins with type coercion.
- Don't serialize retry config from local environment vars.
- Fix `PartitionBy` with scalar key expressions and `diff()`.
- Add {Float16, Float32} -> Float32 lossless upcast.
- Fix panic using `with_columns` and `collect_all`.
- Add multi-page support for writing dictionary-encoded Parquet columns.
- Ensure slice advancement when skipping non-inlinable values in `is_in` with inlinable needles.
- Bugs in ViewArray total_bytes_len.
- Overflow in i128::abs in Decimal fits check.
- Make Expr.hash on Categorical mapping-independent.
- Clone shared GroupBy node before mutation in physical plan creation.
- Fixed "sheet_name" typing for `read_ods` and `read_excel`.
- Improve Polars dtype inference from Python `Union` typing.
- Consider the "current location" of an item when computing `rolling_rank_by`.
- Reset `is_count_star` flag between queries in collect_all.
- Fix incorrect is_between filter on scan_parquet.
- Make polars compatible with ty.
- Lower AnonymousStreamingAgg in group-by as aggregate.
- Avoid overflow in `pl.duration` scalar arguments case.
- Broadcast arr.get on single array with multiple indices.
- Fix panic on CSPE with sorts.
- Eager `DataFrame.slice` with negative offset and `length=None`.
- Use correct schema side for streaming merge join lowering.
- Overflow panic in `scan_csv` with multiple files and `skip_rows + n_rows` larger than total row count.
- Respect `allow_object` flag after cache.
- Raise error on non-elementwise PartitionBy keys.
- Allow ordered categorical dictionary in scan_parquet.
- Allow excess bytes on IPC bitmap compressed length.
- Fix deadlock on `hash_rows()` of 0-width DataFrame.
- Fix NameError filtering pyarrow dataset.
- Fix concat_arr panic when using categoricals/enums.
- Fix NDJSON/scan_lines negative slice splitting with extremely long lines.
- Incorrect group_by min/max fast path.
- Remove a source of non-determinism from lowering.
- Error when `with_row_index` or `unpivot` create duplicate columns on a `LazyFrame`.
- Panics on shift with head.
Affected Symbols
retriesstorage_optionsobject_storescan_ndjsonscan_linesgroup_byarg_max/arg_minmerge-joinBitmap::new_zeroedBuffer::zeroedpath expansionn_uniquesink_parquetExpr.hashCategoricalread_odsread_excelrolling_rank_bycollect_allscan_parquetpl.durationDataFrame.slicescan_csvwith_row_indexunpivotshiftExpr.getExpr.quantileOperator::DivideCategoricalMapping::newto_alpstr.splitscan_ipcsink_ipcDataFrameLazyFramewrite_csvsink_csv
⚡ Deprecations
- Deprecate `retries=n` in favor of `storage_options={"max_retries": n}`.