Change8

py-1.22.0

📦 polarsView on GitHub →
29 features🐛 32 fixes1 deprecations🔧 19 symbols

Summary

This release focuses heavily on performance improvements across various operations, especially within the new streaming engine, and introduces significant enhancements to I/O capabilities, including better support for Unity Catalog and IO plugins. Several bugs related to type handling, aggregations, and specific functions like `Expr.over` and `top_k` have also been resolved.

Migration Steps

  1. If you rely on the old streaming engine, plan to migrate to the new streaming engine as the old one is deprecated.

✨ New Features

  • Projection pushdown added to new streaming multiscan.
  • Implement join on struct dtype.
  • Enable ingest of objects supporting the PyCapsule interface via `from_arrow`.
  • Enable new streaming multiscan for CSV.
  • Environment `POLARS_MAX_CONCURRENT_SCANS` introduced for multiscan in new streaming.
  • Multi/Hive scans supported in the new streaming engine.
  • Added `linear_spaces` function.
  • IO plugins now support lazy schema.
  • Added `write_table()` function to Unity catalog client.
  • Added `is_object` method to Polars `DataType` class.
  • Implement `merge_sorted` for binary.
  • Hold string cache in new streaming engine and fix row-encoding.
  • Add CredentialProviderAzure parameter to accept user-instantiated azure credential classes.
  • Expose unity catalog dataclasses and type aliases.
  • Support max/min method for Time dtype.
  • Implement a streaming merge sorted node.
  • Automatically use temporary credentials API for scanning Unity catalog tables.
  • Add negative slice support to new-streaming engine.
  • Allow for more RG skipping by rewriting expr in planner.
  • Rename catalog `schema` to `namespace`.
  • Add functionality to create and delete catalogs, tables and schemas to Unity catalog client.
  • Allow custom JSONEncoder for the `json_normalize` function, minor speedup.
  • Support passing `aws_profile` in `storage_options`.
  • Improved support for KeyboardInterrupts.
  • Make the available `concat` alignment strategies more generic.
  • Extract timezone info from python datetimes.
  • Add hint for `POLARS_AUTO_USE_AZURE_STORAGE_ACCOUNT_KEY` to error message.
  • Filter Parquet pages with `ParquetColumnExpr`.
  • Expose descending and nulls last in window order-by.

🐛 Bug Fixes

  • Fix `Expr.over` applying scale incorrectly for Decimal types.
  • Fix IO plugin predicate with failed serialization.
  • Ensure `lit` handles datetimes with tzinfo that represents a fixed offset from UTC.
  • Correctly implement take\_(opt\_)chunked\_unchecked for structs.
  • Restore printing backtraces on panics.
  • Use microseconds for Unity catalog datetime unit.
  • Fix incorrect output height for SQL `SELECT COUNT(*) FROM`.
  • Validate/coerce types for comparisons within join\_where predicates.
  • Do not auto-init credential providers if credential fetch returns error.
  • Fix `join_where` incorrectly dropping transformations on RHS of equality expressions.
  • Fix quadratic allocations when loading nested Parquet column metadata.
  • Invalidate sortedness flag when sorting from pl.Categorical to pl.Categorical("lexical").
  • Calling `top_k` on list type panics.
  • Fix rolling on empty DataFrame panicking.
  • Fix `set_tbl_width_chars` panicking with negative width.
  • Ensure `write_excel` recognises the Array dtype and writes it out as a string.
  • Fix `merge_sorted` producing incorrect results or panicking for some logical types.
  • Fix all-null list aggregations returning Null dtype.
  • Ensure scalar-only with\_columns are broadcasted on new-streaming.
  • Improve SQL interface behaviour when `INTERVAL` is not a fixed duration.
  • Address minor regression for one-column DataFrame passed to `is_in` expressions.
  • Add Arrow Float16 conversion DataType.
  • Revert length check of `patterns` in `str.extract_many()`.
  • Add maintain order for flaky new-streaming test.
  • Allow for respawning of new streaming sinks.
  • Ensure Function name correctness in cse.
  • Don't consume c\_stream as iterable.
  • Validate `pl.Array` shape argument types.
  • Fix `from_numpy` returning Null dtype for empty 1D numpy array.
  • Consider the original dtypes when selecting columns in `write_excel` function.
  • Handle boolean comparisons in Iceberg predicate pushdown.
  • Fix `map_elements` panicking with Decimal type.

🔧 Affected Symbols

Expr.overlittake_(opt_)chunked_uncheckedtop_kset_tbl_width_charswrite_excelmerge_sortedis_instr.extract_manypl.Arrayfrom_numpymap_elementspl.Categoricaljoin_whereset_sortedArraywrite_deltajson_normalizeDataType

⚡ Deprecations

  • The old streaming engine is deprecated.