py-1.39.0
📦 polarsView on GitHub →
✨ 45 features🐛 53 fixes🔧 41 symbols
Summary
This release focuses heavily on performance improvements across the streaming engine, I/O operations (CSV, NDJSON, Parquet), and various expression evaluations. Numerous bugs related to type handling, panics in streaming operations, and SQL compatibility have also been addressed.
Migration Steps
- If you were relying on Boolean arithmetic with integer literals producing an Unknown type in the streaming engine, this behavior has been corrected and may require updates to downstream logic.
- If you were using LazyFrame.__contains__, be aware that it now issues a PerformanceWarning.
✨ New Features
- Support Expr for holidays in business day calculations
- Parameter for pivot to always include value column name
- Extend Expr.reinterpret to all numeric types of the same size
- Add missing_columns parameter to scan_csv
- Clear no-op scan projections
- Support nested datatypes for {min,max}_by
- Support SQL ARRAY init from typed literals
- Accept table identifier string in scan_iceberg()
- Add a convenience make fresh command to the Makefile
- Expose "use_zip64" Workbook option for write_excel
- Add unstable LazyFrame.sink_iceberg
- Add maintain order argument on implode
- Speed up casting primitive to bool by at least 2x
- Support ASCII format table input to pl.from_repr
- Enable rowgroup skipping for float columns
- Add expression context to errors
- Add Decimal support for product reduction
- Support all Iceberg V2 arrow types in sink_parquet arrow_schema parameter
- Re-work behavior of arrow_schema parameter on sink_parquet
- Add contains_dtype() method for Schema
- Implement truncate as a "to_zero" rounding mode
- More generic streaming GroupBy lowering
- Create an Alignment TypeAlias
- Add basic MemoryManager to track buffered dataframes for out-of-core support later
- Add truncate Expression for numeric values
- Better error messages for hex literal conversion issues in the SQL interface
- Add SQL support for LPAD and RPAD string functions
- Support SQL "FROM-first" SELECT query syntax
- Improve base_type typing
- Bump Chrono to 0.4.24, enabling stricter parsing of %.3f/%.6f/%.9f specifiers
- Expose unstable assert_schema_equal in py-polars
- Allow parsing of compact ISO 8601 strings
- Add optional "label" param to DataFrame corr
- Configuration to cast integers to floats in cast_options for scan_parquet
- Add escaping to quotes and newlines when reading JSON object into string
- Standardise on RFC-5545 when doing datetime arithmetic on timezone-aware datetimes
- Support sas_token in Azure credential provider
- Relax SQL requirement for derived tables and subqueries to have aliases
- Add polars-config and pl.Config.reload_env_vars()
- Record path for object store error raised from sinks
- Use CRC64NVME for checksum in aws sinks
- Add get() for binary Series
- Add streaming AsOf join node
- Add primitive filter -> agg lowering in streaming GroupBy
- Support for the SQL FETCH clause
🐛 Bug Fixes
- Prevent Boolean arithmetic with integer literals producing Unknown type in streaming engine
- Fix sink to partitioned S3 from Windows corrupted slashes
- Remove outdated warning about List columns in unique()
- Restore pyarrow predicate conversion for is_in
- Release GIL before df.to_ndarray() to avoid deadlock
- Fix panic on CSV count_rows with FORCE_ASYNC
- Add scalar comparisons for UInt128 series
- Fix shape error not raised for 0 width inputs with non-0 height for streaming horizontal concat
- Fix streaming zip-broadcast node did not raise shape mismatch on empty recv from ready port
- Fix incorrect output list.eval with scalar expr, fix panic on list.agg with nulls
- Allow list argument in group_by().map_groups()
- Support for ADBC drivers instantiated with dbc in DataFrame.write_database
- Incorrect arg_sort with descending+limit
- Return ComputeError instead of panicking in map_groups UDF
- Issue PerformanceWarning in LazyFrame.__contains__
- Correct type hint for map_columns function parameter
- Apply thousands_separator to count/null_count in describe() for non-numeric columns
- Ensure proper handling of timedelta when multiplying with a Series
- Correct type hint for function parameter in DataFrame.map_columns
- Segfault in JoinExec on deep plan
- Fix unary expressions on literal in over context
- Fix {min,max}_by in streaming engine for Boolean full {min,max} value column
- Fix debug panic on clip with nan bound
- Support grouped {arg_,}_{min,max} for Categoricals
- Throw an error if a string is passed to LazyFrame.pivot on_columns
- Preserve input float precision in rolling_cov() and rolling_corr() with mixed input types
- Preserve row count when converting zero-column DataFrame via arrow PyCapsule interface
- Prevent infinite recursion in streaming group_by fallback
- Use RowEncodingContext::Struct when determining D::Struct encoded item len
- Incorrectly applied CSE on different map_batches functions
- Fix duplicated query execution on todo panic when combining collect(engine='streaming') with POLARS_AUTO_NEW_STREAMING
- Prevent predicate pushdown across Sort with baked-in slice
- Restore compatibility with pd.Timedelta
- Fix panic on lazy sink_parquet created in pipe_with_schema
- Support {column_name} and {index} placeholders in pl.format string
- Do not use merge-join if nulls_last is unknown
- Normalize float zeros in Parquet column statistics
- Fix out-of-bounds for positive offset in windowed rolling
- Raise error when .get() is out-of-bounds in group by context
- Boolean bitwise_xor aggregation inverted when column contains nulls
- Parameter nulls_last was ignored in over
- Allow missing time in inexact strptime
- Respect nulls_last in sort_by within group_by().agg() slow path
- Return NaN when using corr() with a literal and expr
- Allow strict horizontal concat with empty df
- Fix PoisonError panic caused by reentrant usage of file cache
- Return null for int values exceeding 128-bit range with strict=False
- Incorrect boolean min/max with nulls
- Slice-slice pushdown for n_rows
- Resolve panic in Enum struct slicing
- Fix CSPE for group_by.map_groups
- Remove non-existent parameter from SQLContext typing overloads
- Address pl.from_epoch losing fractional seconds
Affected Symbols
arg_{min,max}scan_csvscan_ndjsonscan_linesExpr.reinterpretscan_csvsink_parquetSchema.contains_dtype()truncate Expressionsink_icebergimplodecast_optionsscan_parquetpl.from_reprrolling_cov()rolling_corr()pl.formatDataFrame.corrLazyFrame.pivotDataFrame.write_databaselist.evallist.agggroup_by().map_groups()df.to_ndarray()CSV count_rowsUInt128 series operationsstreaming horizontal concatstreaming zip-broadcast nodearg_sort.collect_schema()map_groups UDFLazyFrame.__contains__map_columnsdescribe()timedelta multiplicationDataFrame.map_columnsJoinExec{min,max}_byclip{arg_,}_{min,max} for Categoricalspl.from_epoch