Change8

py-1.34.0

📦 polarsView on GitHub →
17 features🐛 43 fixes🔧 22 symbols

Summary

This release introduces new batch collection methods for LazyFrames and significant performance optimizations across various operations, including native streaming support for gather_every and mode(). Numerous bug fixes address issues related to CSV parsing, streaming engine consistency, and cloud storage scanning.

✨ New Features

  • Added LazyFrame.{sink,collect}_batches methods.
  • Implemented deterministic import order for Python Polars package variants.
  • Implemented maintain_order for cross join.
  • Added support to output dt.total_{}() duration values as fractionals.
  • Avoided forcing a pyarrow dependency in read_excel when using the default "calamine" engine.
  • Added support for scanning from file:/path URIs.
  • Added logging to show which file the schema was sourced from, and which file caused an extra column error.
  • Added support to display lazy query plan in marimo notebooks without needing to install matplotlib or mermaid.
  • Added unstable hidden_file_prefix parameter to scan_parquet.
  • Used fixed-scale Decimals.
  • Added support for unsigned 128-bit integers.
  • Added unstable pl.Config.set_default_credential_provider.
  • Enabled roundtrip of BinaryOffset type through Parquet.
  • Added opt-in unstable functionality to load interval types as Struct.
  • Added support for reading parquet metadata from cloud storage.
  • Added user guide section on AWS role assumption.
  • Added support for unique / n_unique / arg_unique for array columns.

🐛 Bug Fixes

  • Fixed parsing of Decimal with comma as decimal separator in CSV.
  • Made Categories pickleable.
  • Fixed shift operation on array within list.
  • Fixed handling of AggregatedScalar in ApplyExpr single input.
  • Fixed support for reading mixed compressed/uncompressed IPC buffers.
  • Fixed overflow in slice-slice optimization.
  • Fixed package discovery for setuptools.
  • Added type assertion to prevent out-of-bounds in GenericFirstLastGroupedReduction.
  • Removed inclusion of polars dir in runtime sdist/wheel.
  • Fixed dt.month_end method unnecessarily raising when the month-start timestamp was ambiguous.
  • Widened from_dicts to Iterable[Mapping[str, Any]].
  • Fixed unsupported arrow type Dictionary error in scan_iceberg().
  • Raised Exception instead of panic when unnesting on non-struct column.
  • Included missing feature dependency from polars-stream/diff to polars-plan/abs.
  • Fixed newline escaping in streaming show_graph.
  • Prevented inferring (-1) the dimension on any Expr.reshape dimension except the first.
  • Fixed sink batches early stop on in-memory engine.
  • Modeled expression ordering requirements more precisely.
  • Fixed panic in zero-weight rolling mean/var.
  • Fixed Decimal <-> literal arithmetic supertype rules.
  • Matched various aggregation return types in the streaming engine with the in-memory engine.
  • Validated list type for list expressions in planner.
  • Fixed scan_iceberg() storage options not taking effect.
  • Made log() prioritize the leftmost dtype for its output dtype.
  • Fixed CSV pl.len() calculation.
  • Added support for float inputs for duration types.
  • Fixed roundtrip of empty string through hive partitioning.
  • Fixed potential OOB writes in unaligned IPC read.
  • Fixed regression error when scanning AWS presigned URL.
  • Made PlPath::join for cloud paths replace on absolute paths.
  • Corrected dtype for cum_agg in streaming engine.
  • Restored support for np.datetime64() in pl.lit().
  • Ignored Iceberg list element ID if missing.
  • Fixed panic on streaming full join with coalesce.
  • Fixed AggState on all_literal in BinaryExpr.
  • Showed IR sort options in explain.
  • Fixed schema on ApplyExpr with single row literal in agg context.
  • Fixed planner schema for dividing pl.Float32 by int.
  • Fixed panic scanning from AWS legacy global endpoint URL.
  • Fixed iterable_to_pydf(..., infer_schema_length=None) to scan all data.
  • Does not propagate struct of nulls with null.
  • Became stricter with invalid NDJSON input when ignore_errors=False.
  • Implemented approx_n_unique for temporal dtypes and Null.

🔧 Affected Symbols

LazyFrame.{sink,collect}_batchesread_excelscan_parquetpl.Config.set_default_credential_providerdt.total_{}scan_icebergAggregatedScalarApplyExprCategoriesslice-slice optimizationsetuptoolsGenericFirstLastGroupedReductiondt.month_endfrom_dictsunnestExpr.reshapepl.len()pl.lit()PlPath::joincum_aggiterable_to_pydfapprox_n_unique