Change8

py-1.39.0

📦 polarsView on GitHub →
45 features🐛 53 fixes🔧 41 symbols

Summary

This release focuses heavily on performance improvements across the streaming engine, I/O operations (CSV, NDJSON, Parquet), and various expression evaluations. Numerous bugs related to type handling, panics in streaming operations, and SQL compatibility have also been addressed.

Migration Steps

  1. If you were relying on Boolean arithmetic with integer literals producing an Unknown type in the streaming engine, this behavior has been corrected and may require updates to downstream logic.
  2. If you were using LazyFrame.__contains__, be aware that it now issues a PerformanceWarning.

✨ New Features

  • Support Expr for holidays in business day calculations
  • Parameter for pivot to always include value column name
  • Extend Expr.reinterpret to all numeric types of the same size
  • Add missing_columns parameter to scan_csv
  • Clear no-op scan projections
  • Support nested datatypes for {min,max}_by
  • Support SQL ARRAY init from typed literals
  • Accept table identifier string in scan_iceberg()
  • Add a convenience make fresh command to the Makefile
  • Expose "use_zip64" Workbook option for write_excel
  • Add unstable LazyFrame.sink_iceberg
  • Add maintain order argument on implode
  • Speed up casting primitive to bool by at least 2x
  • Support ASCII format table input to pl.from_repr
  • Enable rowgroup skipping for float columns
  • Add expression context to errors
  • Add Decimal support for product reduction
  • Support all Iceberg V2 arrow types in sink_parquet arrow_schema parameter
  • Re-work behavior of arrow_schema parameter on sink_parquet
  • Add contains_dtype() method for Schema
  • Implement truncate as a "to_zero" rounding mode
  • More generic streaming GroupBy lowering
  • Create an Alignment TypeAlias
  • Add basic MemoryManager to track buffered dataframes for out-of-core support later
  • Add truncate Expression for numeric values
  • Better error messages for hex literal conversion issues in the SQL interface
  • Add SQL support for LPAD and RPAD string functions
  • Support SQL "FROM-first" SELECT query syntax
  • Improve base_type typing
  • Bump Chrono to 0.4.24, enabling stricter parsing of %.3f/%.6f/%.9f specifiers
  • Expose unstable assert_schema_equal in py-polars
  • Allow parsing of compact ISO 8601 strings
  • Add optional "label" param to DataFrame corr
  • Configuration to cast integers to floats in cast_options for scan_parquet
  • Add escaping to quotes and newlines when reading JSON object into string
  • Standardise on RFC-5545 when doing datetime arithmetic on timezone-aware datetimes
  • Support sas_token in Azure credential provider
  • Relax SQL requirement for derived tables and subqueries to have aliases
  • Add polars-config and pl.Config.reload_env_vars()
  • Record path for object store error raised from sinks
  • Use CRC64NVME for checksum in aws sinks
  • Add get() for binary Series
  • Add streaming AsOf join node
  • Add primitive filter -> agg lowering in streaming GroupBy
  • Support for the SQL FETCH clause

🐛 Bug Fixes

  • Prevent Boolean arithmetic with integer literals producing Unknown type in streaming engine
  • Fix sink to partitioned S3 from Windows corrupted slashes
  • Remove outdated warning about List columns in unique()
  • Restore pyarrow predicate conversion for is_in
  • Release GIL before df.to_ndarray() to avoid deadlock
  • Fix panic on CSV count_rows with FORCE_ASYNC
  • Add scalar comparisons for UInt128 series
  • Fix shape error not raised for 0 width inputs with non-0 height for streaming horizontal concat
  • Fix streaming zip-broadcast node did not raise shape mismatch on empty recv from ready port
  • Fix incorrect output list.eval with scalar expr, fix panic on list.agg with nulls
  • Allow list argument in group_by().map_groups()
  • Support for ADBC drivers instantiated with dbc in DataFrame.write_database
  • Incorrect arg_sort with descending+limit
  • Return ComputeError instead of panicking in map_groups UDF
  • Issue PerformanceWarning in LazyFrame.__contains__
  • Correct type hint for map_columns function parameter
  • Apply thousands_separator to count/null_count in describe() for non-numeric columns
  • Ensure proper handling of timedelta when multiplying with a Series
  • Correct type hint for function parameter in DataFrame.map_columns
  • Segfault in JoinExec on deep plan
  • Fix unary expressions on literal in over context
  • Fix {min,max}_by in streaming engine for Boolean full {min,max} value column
  • Fix debug panic on clip with nan bound
  • Support grouped {arg_,}_{min,max} for Categoricals
  • Throw an error if a string is passed to LazyFrame.pivot on_columns
  • Preserve input float precision in rolling_cov() and rolling_corr() with mixed input types
  • Preserve row count when converting zero-column DataFrame via arrow PyCapsule interface
  • Prevent infinite recursion in streaming group_by fallback
  • Use RowEncodingContext::Struct when determining D::Struct encoded item len
  • Incorrectly applied CSE on different map_batches functions
  • Fix duplicated query execution on todo panic when combining collect(engine='streaming') with POLARS_AUTO_NEW_STREAMING
  • Prevent predicate pushdown across Sort with baked-in slice
  • Restore compatibility with pd.Timedelta
  • Fix panic on lazy sink_parquet created in pipe_with_schema
  • Support {column_name} and {index} placeholders in pl.format string
  • Do not use merge-join if nulls_last is unknown
  • Normalize float zeros in Parquet column statistics
  • Fix out-of-bounds for positive offset in windowed rolling
  • Raise error when .get() is out-of-bounds in group by context
  • Boolean bitwise_xor aggregation inverted when column contains nulls
  • Parameter nulls_last was ignored in over
  • Allow missing time in inexact strptime
  • Respect nulls_last in sort_by within group_by().agg() slow path
  • Return NaN when using corr() with a literal and expr
  • Allow strict horizontal concat with empty df
  • Fix PoisonError panic caused by reentrant usage of file cache
  • Return null for int values exceeding 128-bit range with strict=False
  • Incorrect boolean min/max with nulls
  • Slice-slice pushdown for n_rows
  • Resolve panic in Enum struct slicing
  • Fix CSPE for group_by.map_groups
  • Remove non-existent parameter from SQLContext typing overloads
  • Address pl.from_epoch losing fractional seconds

Affected Symbols