Change8

py-1.34.0-beta.5

📦 polars
17 features🐛 42 fixes🔧 19 symbols

Summary

This release introduces new lazy sink/collect batch methods, enhances performance across various scan and expression operations, and fixes numerous bugs related to data types, streaming, and cloud storage interactions.

✨ New Features

  • Add LazyFrame.{sink,collect}_batches
  • Deterministic import order for Python Polars package variants
  • Implement maintain_order for cross join
  • Add support to output dt.total_{}() duration values as fractionals
  • Avoid forcing a pyarrow dependency in read_excel when using the default "calamine" engine
  • Support scanning from file:/path URIs
  • Log which file the schema was sourced from, and which file caused an extra column error
  • Add support to display lazy query plan in marimo notebooks without needing to install matplotlib or mermaid
  • Add unstable hidden_file_prefix parameter to scan_parquet
  • Use fixed-scale Decimals
  • Add support for unsigned 128-bit integers
  • Add unstable pl.Config.set_default_credential_provider
  • Roundtrip BinaryOffset type through Parquet
  • Add opt-in unstable functionality to load interval types as Struct
  • Support reading parquet metadata from cloud storage
  • Add user guide section on AWS role assumption
  • Support unique / n_unique / arg_unique for array columns

🐛 Bug Fixes

  • Make Categories pickleable
  • Shift on array within list
  • Fix handling of AggregatedScalar in ApplyExpr single input
  • Support reading of mixed compressed/uncompressed IPC buffers
  • Overflow in slice-slice optimization
  • Package discovery for setuptools
  • Add type assertion to prevent out-of-bounds in GenericFirstLastGroupedReduction
  • Remove inclusion of polars dir in runtime sdist/wheel
  • Method dt.month_end was unnecessarily raising when the month-start timestamp was ambiguous
  • Widen from_dicts to Iterable[Mapping[str, Any]]
  • Fix unsupported arrow type Dictionary error in scan_iceberg()
  • Raise Exception instead of panic when unnest on non-struct column
  • Include missing feature dependency from polars-stream/diff to polars-plan/abs
  • Newline escaping in streaming show_graph
  • Do not allow inferring (-1) the dimension on any Expr.reshape dimension except the first
  • Sink batches early stop on in-memory engine
  • More precisely model expression ordering requirements
  • Panic in zero-weight rolling mean/var
  • Decimal <-> literal arithmetic supertype rules
  • Match various aggregation return types in the streaming engine with the in-memory engine
  • Validate list type for list expressions in planner
  • Fix scan_iceberg() storage options not taking effect
  • Have log() prioritize the leftmost dtype for its output dtype
  • CSV pl.len() was incorrect
  • Add support for float inputs for duration types
  • Roundtrip empty string through hive partitioning
  • Fix potential OOB writes in unaligned IPC read
  • Fix regression error when scanning AWS presigned URL
  • Make PlPath::join for cloud paths replace on absolute paths
  • Correct dtype for cum_agg in streaming engine
  • Restore support for np.datetime64() in pl.lit()
  • Ignore Iceberg list element ID if missing
  • Fix panic on streaming full join with coalesce
  • Fix AggState on all_literal in BinaryExpr
  • Show IR sort options in explain
  • Fix schema on ApplyExpr with single row literal in agg context
  • Fix planner schema for dividing pl.Float32 by int
  • Fix panic scanning from AWS legacy global endpoint URL
  • Fix iterable_to_pydf(..., infer_schema_length=None) to scan all data
  • Do not propagate struct of nulls with null
  • Be stricter with invalid NDJSON input when ignore_errors=False
  • Implement approx_n_unique for temporal dtypes and Null

🔧 Affected Symbols

LazyFrame.{sink,collect}_batchesdt.total_{}read_excelscan_parquetpl.Config.set_default_credential_providerCategoriesAggregatedScalarApplyExprscan_iceberg()dt.month_endfrom_dictsExpr.reshapelog()pl.len()PlPath::joinpl.lit()BinaryExprexplainiterable_to_pydf