Change8

py-1.36.0

📦 polarsView on GitHub →
53 features🐛 54 fixes🔧 69 symbols

Summary

This release introduces significant enhancements across SQL support (including window functions and QUALIFY), new DataFrame/Expr methods like bin slicing and improved rolling calculations, and numerous performance optimizations, especially for streaming and Parquet I/O. Numerous bugs related to panics, dtype handling, and SQL expression resolution have also been fixed.

✨ New Features

  • Add Extension types
  • Add SQL support for the QUALIFY clause
  • Add bin.slice(), bin.head(), and bin.tail() methods
  • Add SQL syntax support for CROSS JOIN UNNEST(col)
  • Add separate env var to log tracked metrics
  • Expose fields for generating physical plan visualization data
  • Allow pl.Object in pivot value
  • Minor improvement for as_struct repr
  • Temporal quantile in rolling context
  • Add quantile for missing temporals
  • Add strict parameter to pl.concat(how='horizontal')
  • Support decimals in search_sorted
  • Expose and document pl.Categories
  • Use reference to Graph pipes when flushing metrics
  • Extend SQL UNNEST support to handle multiple array expressions
  • Add SQL support for ROW_NUMBER, RANK, and DENSE_RANK functions
  • Allow elementwise Expr.over in aggregation context
  • Add SQL support for named WINDOW references
  • Add leftmost option to str.replace_many / str.find_many / str.extract_many
  • Automatically Parquet dictionary encode floats
  • Support unique_counts for all datatypes
  • Add maintain_order to Expr.mode
  • Allow hash for all List dtypes
  • Add empty_as_null and keep_nulls to {Lazy,Data}Frame.explode
  • Display function of streaming physical plan map node
  • Allow slice on scalar in aggregation context
  • Allow implode and aggregation in aggregation context
  • Move GraphMetrics into StreamingQuery
  • Documentation on Polars Cloud manifests
  • Add empty_as_null and keep_nulls flags to Expr.explode
  • Allow Expr.unique on List/Array with non-numeric types
  • Raise suitable error on non-integer "n" value for clear
  • Allow Expr.rolling in aggregation contexts
  • Allow bare .row() on a single-row DataFrame, equivalent to .item() on a single-element DataFrame
  • Support additional forms of SQL CREATE TABLE statements
  • Add support for Float16 dtype
  • Support column-positional SQL "UNION" operations
  • Add unstable Schema.to_arrow()
  • Make DSL-hash skippable
  • Improve error message on unsupported SQL subquery comparisons
  • Support arbitrary expressions in SQL `JOIN` constraints
  • Allow arbitrary expressions as the `Expr.rolling` `index_column`
  • Set polars/<version> user-agent
  • Support `ewm_var/std` in streaming engine
  • Rewrite `IR::Scan` to `IR::DataFrameScan` in `expand_datasets` when applicable
  • Add ignore_nulls to first / last
  • Allow arbitrary Expressions in "subset" parameter of `unique` frame method
  • Add `BIT_NOT` support to the SQL interface
  • Streaming {Expr,LazyFrame}.rolling
  • Add LazyFrame.pivot
  • Add SQL support for `LEAD` and `LAG` functions
  • Add having to group_by context
  • Add show methods for DataFrame and LazyFrame

🐛 Bug Fixes

  • Rechunk on nested dtypes in take_unchecked_impl parallel path
  • Fix streaming SchemaMismatch panic on list.drop_nulls
  • Fix panic on Boolean rolling_sum calculation for list or array eval
  • Fix "dtype is unknown" panic in cross joins with literals
  • Fix panic edge-case when scanning hive partitioned data
  • Fix "unreachable code" panic in UDF dtype inference
  • Address potential "batch_size" parameter collision in scan_pyarrow_dataset
  • Fix empty format handling
  • Improve SQL GROUP BY and ORDER BY expression resolution, handling aliasing edge-cases
  • Preserve List inner dtype during chunked take operations
  • Fix lifetime for AmortSeries lazy group iterator
  • Fix spearman panicking on nulls
  • Properly resolve HAVING clause during SQL GROUP BY operations
  • Prevent false positives in is_in for large integers
  • Differentiate between empty list an no list for unpivot
  • Bug in boolean unique_counts
  • Hang in multi-chunk DataFrame .rows()
  • Correct arr_to_any_value for object arrays
  • Have PySeries::new_f16 receive pf16s instead of f32s
  • Set Float16 parquet schema type to Float16
  • Fix incorrect .list.eval after slicing operations
  • Strict conversion AnyValue to Struct
  • Rolling mean/median for temporals
  • Add .rolling_rank() support for temporal types and pl.Boolean
  • Fix occurence of exact matches of .join_asof(strategy="nearest", allow_exact_matches=False, ...)
  • Always respect return_dtype in map_elements and map_rows
  • Fix group lengths check in sort_by with AggregatedScalar
  • Fix dictionary replacement error in write_ipc()
  • Fix expr slice pushdown causing shape error on literals
  • Allow empty list in sort_by in list.eval context
  • Raise error on out-of-range dates in temporal operations
  • Validate list.slice parameters are not lists
  • Make sum on strings error in group_by context
  • Prevent panic when joining sorted LazyFrame with itself
  • Apply CSV dict overrides by name only
  • Incorrect result in aggregated first/last with ignore_nulls
  • Fix off-by-one bug in `ColumnPredicates` generation for inequalities operating on integer columns
  • Use Cargo.template.toml to prevent git dependencies from using template
  • Fix arr.{eval,agg} in aggregation context
  • Support AggregatedList in list.{eval,agg} context
  • Nested dtypes in streaming first_non_null/last_non_null
  • Remove Expr casts in pl.lit invocations
  • Optimize projection pushdown through HConcat
  • Revert pl.format behavior with nulls
  • Correct eq_missing for struct with nulls
  • Resolve edge-case with SQL aggregates that have the same name as one of the GROUP BY keys
  • Unique on literal in aggregation context
  • Aggregation with drop_nulls on literal
  • SQL NATURAL joins should coalesce the key columns
  • Mark {forward,backward}_fill as length_preserving
  • Correct drop_items for scalar input
  • Schema mismatch with list.agg, unique and scalar
  • AnyValue::to_physical for categoricals
  • Bugs in pl.from_repr with signed exponential floats

🔧 Affected Symbols

pl.concatbin.slicebin.headbin.tailpl.Objectas_structpl.rollingpl.concat(how='horizontal')search_sortedpl.Categoriesstr.replace_manystr.find_manystr.extract_manyExpr.overExpr.modeExpr.explodeDataFrame.explodeLazyFrame.explode.row().item()Schema.to_arrowExpr.rollingLazyFrame.group_by_dynamicLazyFrame.rollingExpr.rollingfirstlastuniqueBIT_NOTLEADLAGgroup_by_dynamicgroup_bytake_unchecked_impllist.drop_nullsrolling_sumcross joinsscan_pyarrow_datasetis_betweenunpivotunique_countsarr_to_any_valuePySeries::new_f16.list.evalAnyValueStructrolling_rankjoin_asofmap_elementsmap_rowssort_bywrite_ipclist.slicetemporal operationssum on stringsjoinCSV dict overridesColumnPredicatesarr.{eval,agg}list.{eval,agg}first_non_nulllast_non_nullpl.litHConcatpl.formateq_missingSQL aggregatesdrop_itemspl.from_repr