py-1.22.0
📦 polarsView on GitHub →
✨ 29 features🐛 32 fixes⚡ 1 deprecations🔧 19 symbols
Summary
This release focuses heavily on performance improvements across various operations, especially within the new streaming engine, and introduces significant enhancements to I/O capabilities, including better support for Unity Catalog and IO plugins. Several bugs related to type handling, aggregations, and specific functions like `Expr.over` and `top_k` have also been resolved.
Migration Steps
- If you rely on the old streaming engine, plan to migrate to the new streaming engine as the old one is deprecated.
✨ New Features
- Projection pushdown added to new streaming multiscan.
- Implement join on struct dtype.
- Enable ingest of objects supporting the PyCapsule interface via `from_arrow`.
- Enable new streaming multiscan for CSV.
- Environment `POLARS_MAX_CONCURRENT_SCANS` introduced for multiscan in new streaming.
- Multi/Hive scans supported in the new streaming engine.
- Added `linear_spaces` function.
- IO plugins now support lazy schema.
- Added `write_table()` function to Unity catalog client.
- Added `is_object` method to Polars `DataType` class.
- Implement `merge_sorted` for binary.
- Hold string cache in new streaming engine and fix row-encoding.
- Add CredentialProviderAzure parameter to accept user-instantiated azure credential classes.
- Expose unity catalog dataclasses and type aliases.
- Support max/min method for Time dtype.
- Implement a streaming merge sorted node.
- Automatically use temporary credentials API for scanning Unity catalog tables.
- Add negative slice support to new-streaming engine.
- Allow for more RG skipping by rewriting expr in planner.
- Rename catalog `schema` to `namespace`.
- Add functionality to create and delete catalogs, tables and schemas to Unity catalog client.
- Allow custom JSONEncoder for the `json_normalize` function, minor speedup.
- Support passing `aws_profile` in `storage_options`.
- Improved support for KeyboardInterrupts.
- Make the available `concat` alignment strategies more generic.
- Extract timezone info from python datetimes.
- Add hint for `POLARS_AUTO_USE_AZURE_STORAGE_ACCOUNT_KEY` to error message.
- Filter Parquet pages with `ParquetColumnExpr`.
- Expose descending and nulls last in window order-by.
🐛 Bug Fixes
- Fix `Expr.over` applying scale incorrectly for Decimal types.
- Fix IO plugin predicate with failed serialization.
- Ensure `lit` handles datetimes with tzinfo that represents a fixed offset from UTC.
- Correctly implement take\_(opt\_)chunked\_unchecked for structs.
- Restore printing backtraces on panics.
- Use microseconds for Unity catalog datetime unit.
- Fix incorrect output height for SQL `SELECT COUNT(*) FROM`.
- Validate/coerce types for comparisons within join\_where predicates.
- Do not auto-init credential providers if credential fetch returns error.
- Fix `join_where` incorrectly dropping transformations on RHS of equality expressions.
- Fix quadratic allocations when loading nested Parquet column metadata.
- Invalidate sortedness flag when sorting from pl.Categorical to pl.Categorical("lexical").
- Calling `top_k` on list type panics.
- Fix rolling on empty DataFrame panicking.
- Fix `set_tbl_width_chars` panicking with negative width.
- Ensure `write_excel` recognises the Array dtype and writes it out as a string.
- Fix `merge_sorted` producing incorrect results or panicking for some logical types.
- Fix all-null list aggregations returning Null dtype.
- Ensure scalar-only with\_columns are broadcasted on new-streaming.
- Improve SQL interface behaviour when `INTERVAL` is not a fixed duration.
- Address minor regression for one-column DataFrame passed to `is_in` expressions.
- Add Arrow Float16 conversion DataType.
- Revert length check of `patterns` in `str.extract_many()`.
- Add maintain order for flaky new-streaming test.
- Allow for respawning of new streaming sinks.
- Ensure Function name correctness in cse.
- Don't consume c\_stream as iterable.
- Validate `pl.Array` shape argument types.
- Fix `from_numpy` returning Null dtype for empty 1D numpy array.
- Consider the original dtypes when selecting columns in `write_excel` function.
- Handle boolean comparisons in Iceberg predicate pushdown.
- Fix `map_elements` panicking with Decimal type.
🔧 Affected Symbols
Expr.overlittake_(opt_)chunked_uncheckedtop_kset_tbl_width_charswrite_excelmerge_sortedis_instr.extract_manypl.Arrayfrom_numpymap_elementspl.Categoricaljoin_whereset_sortedArraywrite_deltajson_normalizeDataType⚡ Deprecations
- The old streaming engine is deprecated.