py-1.21.0

📅 Jan 24, 2025📦 polarsView on GitHub →

✨ 24 features🐛 22 fixes🔧 22 symbols

Summary

This release focuses heavily on performance improvements through increased use of BitmapBuilder and enhancements across the streaming engine, including new CSV and NDJson sinks. Numerous bug fixes address issues related to Decimal types, slicing, joins, and error handling.

Migration Steps

If you were relying on `nest-asyncio`, remove the dependency as it is replaced by custom logic.

✨ New Features

Stabilize methods/functions
Add `linear_space`
Improve string → temporal parsing in `read_excel` and `read_ods`
Implement df.unique() on new-streaming engine
Experimental credential provider support for Delta read/scan/write
Allow column expressions in DataFrame `unnest`
Auto-initialize Python credential providers in more cases
Add unique operations for Decimal dtype
Add NDJson sink for the new streaming engine
Support nested keys in window functions
Add CSV sink for the new streaming engine
Periodically check python signals ('CTRL-C' handling)
Experimental unity catalog client
Support cumulative aggregations for `Decimal` dtype
Account for SurrealDB Python API updates (handle both `SurrealDB` and `AsyncSurrealDB` classes) in `read_database`
Drop `nest-asyncio` in favor of custom logic
Improve window function caching strategy
Support `lakefs://` URI for delta scanner
Additional support for loading `numpy.float16` values (as Float32)
Implement negative slice for new streaming IPC
Debloat Series bitops
Reduce python map bloat
Dispatch to the in-mem engine for `AExpr::Gather`
Dispatch to the in-memory engine for multifile sources

🐛 Bug Fixes

Warn if asof keys not sorted
Ensure explicit values given to `column_widths` override autofit in `write_excel`
Avoid name collisions and panicking in object conversion
Incorrect scale used in `log` and `exp` for Decimal type
Don't deep clone manuallydrop in GroupsPosition
Fix DuplicateError when selecting columns after `join_where` or cross join + filter
Incorrect `Decimal` value for `fill_null(strategy="one")`
Fix one edge case (out of many) of int128 literals not working
Add height check to frame-level row indexing when key is int
Remove `assert` that panics on `group_by` followed by `head(n)`, where `n` is larger then the frame height
Selectors should raise on `+` between themselves
Fix panic `InvalidHeaderValue` scanning from S3 on Windows
Fix `clip` for `Decimal` returning wrong values
Incorrect height from slicing after projecting only the file path column
Shift mask when skipping Bitpacked values in Parquet
Error instead of truncate if length mismatch for several `str` functions
Support cumulative aggregations for `Decimal` dtype (Note: This appears in both features and bug fixes, keeping as is)
Allow `is_in` values to be given as custom `Collection`
Propagate null instead of panicking in `pl.repeat_by()`
Do not print sensitive information to output on `POLARS_VERBOSE`
Ignore file cache allocation error if `fallocate()` is not permitted
Incorrect logic in `assert_series_equal` for infinities

🔧 Affected Symbols

read_excelread_odsdf.unique()unnestread_databaseSurrealDBAsyncSurrealDBnest-asynciodelta scannernumpy.float16write_excelcolumn_widthslogexpDecimalfill_nullgroup_byhead(n)clippl.repeat_by()assert_series_equalAExpr::Gather