v4.0.1

Breaking Changes

📅 Mar 26, 2025📦 sentence-transformers

⚠ 3 breaking✨ 17 features⚡ 3 deprecations🔧 8 symbols

Summary

Version 4.0.1 introduces a complete overhaul of the CrossEncoder training pipeline with a new `CrossEncoderTrainer`, dataset‑based inputs, multi‑GPU and bf16 support, and many training‑related enhancements, while keeping inference unchanged.

⚠️ Breaking Changes

The old training workflow using `InputExample`, `DataLoader` and `model.fit()` has been removed. Switch to the new `CrossEncoderTrainer` with a `datasets.Dataset` or `DatasetDict` and a `CrossEncoderTrainingArguments` instance.
The `loss` argument now expects a loss object or a dictionary of loss objects keyed by dataset names; passing a single loss for a `DatasetDict` without a matching dict will raise an error.
Automatic model card generation now overwrites previous manual cards; if you relied on custom cards you must disable or edit them after training.

Migration Steps

Replace any usage of `model.fit(...)` with a `CrossEncoderTrainer` instance and call `trainer.train()`.
Convert training data from lists of `InputExample` or `DataLoader` objects to a HuggingFace `datasets.Dataset` (or `DatasetDict`).
If you train on multiple datasets, provide a dictionary of loss objects keyed by the dataset names and pass a matching dictionary to the `loss` parameter.
Import and use `CrossEncoderTrainingArguments` (or the base `TrainingArguments`) instead of the old argument objects.
Update imports to include `CrossEncoderTrainer` and, if needed, `CrossEncoderTrainingArguments` from `sentence_transformers`.
Install the `[train]` extra (`pip install sentence-transformers[train]==4.0.1`) to get the new training dependencies.
If you rely on custom model cards, disable the automatic generation or edit the generated card after training.

✨ New Features

Multi‑GPU training support with both Data Parallelism (DP) and Distributed Data Parallelism (DDP).
Native bf16 training support.
Loss logging during training.
Evaluation datasets and evaluation loss integration.
Enhanced callback system with built‑in support for Weights & Biases, TensorBoard, CodeCarbon and custom callbacks.
Gradient checkpointing to reduce memory usage.
Gradient accumulation support.
Improved automatic model‑card generation for trained models.
Configurable warmup ratio in training arguments.
Automatic push of model checkpoints to the Hugging Face Hub.
Resume training from a saved checkpoint.
Hyperparameter optimization integration.
New `CrossEncoderTrainer` class built on 🤗 Transformers `Trainer`.
New `CrossEncoderTrainingArguments` subclass for detailed training configuration.
Ability to train on `datasets.Dataset` or `datasets.DatasetDict` instead of lists of `InputExample`.
Support for providing a dictionary of loss functions when training with multiple datasets.
Optional `SentenceEvaluator` can be used alongside evaluation loss.

🔧 Affected Symbols

CrossEncoderCrossEncoderTrainerCrossEncoderTrainingArgumentsBinaryCrossEntropyLossCachedMultipleNegativesRankingLossSentenceEvaluatorInputExamplemodel.fit

⚡ Deprecations

`InputExample` class for training is deprecated.
`model.fit` method on `CrossEncoder` is deprecated.
Legacy training arguments without using `CrossEncoderTrainingArguments` are deprecated.