ray-2.54.0
Breaking Changes📦 rayView on GitHub →
⚠ 2 breaking✨ 16 features🐛 30 fixes⚡ 1 deprecations🔧 18 symbols
Summary
This release introduces significant new features in Ray Data, including checkpointing, expanded Compute Expressions, and Databricks UC credential support, alongside major enhancements to cluster autoscaling and performance optimizations. Ray Serve gains queue-based autoscaling for TaskConsumers and improved deployment observability.
⚠️ Breaking Changes
- Removed top-level `ray.data` imports to decouple Ray Train from Ray Data. Users must now import necessary components directly from `ray.data` submodules or use the new extension type locations.
- Removed deprecated constant `TENSOR_COLUMN_NAME`. Users relying on this constant must update their code.
Migration Steps
- If you were using top-level imports from `ray.data`, update them to import from specific submodules or use the new locations for extension types.
- If you were using the deprecated constant `TENSOR_COLUMN_NAME`, replace it with the appropriate alternative.
✨ New Features
- Added checkpointing support to Ray Data.
- Added support for list operations, fixed-size arrays, string padding, logarithmic, trigonometric, arithmetic, and rounding operations in Compute Expressions.
- Added `sql_params` support to `read_sql`.
- Added `AsList` aggregation.
- Added support for `CountDistinct` aggregate.
- Added credential provider abstraction for Databricks UC datasource.
- Support callable classes for `UDFExpr`.
- Added autoscaler metrics to Data Dashboard.
- Added optional filesystem parameter to download expression.
- Allow specifying partitioning style or flavor in `write_parquet()`.
- New cluster autoscaler enabled by default.
- Ray Serve: Introduced queue-based autoscaling for TaskConsumer deployments (phase 1) using a `QueueMonitor` actor.
- Ray Serve: Added `apply_autoscaling_config` decorator for custom autoscaling policies to inherit default parameters.
- Ray Serve: Added `label_selector` and `bundle_label_selector` to Serve deployments for hardware targeting.
- Ray Serve: Deployment-level autoscaling observability via structured JSON `serve_autoscaling_snapshot` logs.
- Ray Serve: Batching now guarantees each batch contains requests for the same multiplexed model when using `@serve.batch`.
🐛 Bug Fixes
- Fixed `MapBatches` fusion when row count is modified.
- Fixed pushing limit past `map_batches` by default.
- Fixed wrong type hint of other dataset in zip and union operations.
- Fixed `ActorPoolMapOperator` to guarantee dispatch of all given inputs.
- Fixed `ArrowInvalid` error when backfilling missing fields from map tasks.
- Fixed attribute error in `UnionOperator.clear_internal_output_queue`.
- Fixed `DefaultClusterAutoscalerV2` raising KeyError: 'CPU'.
- Fixed `ReorderingBundleQueue` handling of empty output sequences.
- Fixed task completion time metric name for backpressure grafana panel.
- Fixed Union operator blocking when preserve_order is set.
- Fixed autoscaler requesting empty resources instead of previous allocation when not scaling up.
- Fixed autoscaler not respecting user-configured resource limits.
- Fixed `DefaultAutoscalerV2` not scaling nodes from zero.
- Fixed Iceberg warning message.
- Fixed Parquet datasource path column support.
- Fixed ProgressBar when using `use_ray_tqdm`.
- Fixed stale stats on refit for preprocessors.
- Fixed `StreamingRepartition` hang with empty upstream results.
- Fixed operator fusion bug to preserve UDF modifying row count.
- Fixed `AutoscalingCoordinator` double-allocating resources for multiple datasets.
- Fixed `DownstreamCapacityBackpressurePolicy` issues.
- Fixed `AutoscalingCoordinator` crash when requesting 0 GPUs on CPU-only cluster.
- Fixed `TensorArray` to `Arrow` tensor conversion.
- Fixed resource allocator not respecting max resource requirement.
- Fixed GPU autoscaling when `max_actors` is set.
- Fixed checkpoint filter PyArrow zero-copy conversion error.
- Restored class aliases to fix deserialization of existing datasets.
- Fixed DataContext deserialization issue with StatsActor.
- Ray Serve: Fixed `lookback_period_s` validation to ensure it is greater than `metrics_interval_s`.
- Ray Serve: Fixed `root_path` support for uvicorn versions >=0.26.0.
Affected Symbols
⚡ Deprecations
- Demoted Sort from requiring `preserve_order`.