v4.0.1
Breaking Changes📦 sentence-transformers
⚠ 3 breaking✨ 17 features⚡ 3 deprecations🔧 8 symbols
Summary
Version 4.0.1 introduces a complete overhaul of the CrossEncoder training pipeline with a new `CrossEncoderTrainer`, dataset‑based inputs, multi‑GPU and bf16 support, and many training‑related enhancements, while keeping inference unchanged.
⚠️ Breaking Changes
- The old training workflow using `InputExample`, `DataLoader` and `model.fit()` has been removed. Switch to the new `CrossEncoderTrainer` with a `datasets.Dataset` or `DatasetDict` and a `CrossEncoderTrainingArguments` instance.
- The `loss` argument now expects a loss object or a dictionary of loss objects keyed by dataset names; passing a single loss for a `DatasetDict` without a matching dict will raise an error.
- Automatic model card generation now overwrites previous manual cards; if you relied on custom cards you must disable or edit them after training.
Migration Steps
- Replace any usage of `model.fit(...)` with a `CrossEncoderTrainer` instance and call `trainer.train()`.
- Convert training data from lists of `InputExample` or `DataLoader` objects to a HuggingFace `datasets.Dataset` (or `DatasetDict`).
- If you train on multiple datasets, provide a dictionary of loss objects keyed by the dataset names and pass a matching dictionary to the `loss` parameter.
- Import and use `CrossEncoderTrainingArguments` (or the base `TrainingArguments`) instead of the old argument objects.
- Update imports to include `CrossEncoderTrainer` and, if needed, `CrossEncoderTrainingArguments` from `sentence_transformers`.
- Install the `[train]` extra (`pip install sentence-transformers[train]==4.0.1`) to get the new training dependencies.
- If you rely on custom model cards, disable the automatic generation or edit the generated card after training.
✨ New Features
- Multi‑GPU training support with both Data Parallelism (DP) and Distributed Data Parallelism (DDP).
- Native bf16 training support.
- Loss logging during training.
- Evaluation datasets and evaluation loss integration.
- Enhanced callback system with built‑in support for Weights & Biases, TensorBoard, CodeCarbon and custom callbacks.
- Gradient checkpointing to reduce memory usage.
- Gradient accumulation support.
- Improved automatic model‑card generation for trained models.
- Configurable warmup ratio in training arguments.
- Automatic push of model checkpoints to the Hugging Face Hub.
- Resume training from a saved checkpoint.
- Hyperparameter optimization integration.
- New `CrossEncoderTrainer` class built on 🤗 Transformers `Trainer`.
- New `CrossEncoderTrainingArguments` subclass for detailed training configuration.
- Ability to train on `datasets.Dataset` or `datasets.DatasetDict` instead of lists of `InputExample`.
- Support for providing a dictionary of loss functions when training with multiple datasets.
- Optional `SentenceEvaluator` can be used alongside evaluation loss.
🔧 Affected Symbols
CrossEncoderCrossEncoderTrainerCrossEncoderTrainingArgumentsBinaryCrossEntropyLossCachedMultipleNegativesRankingLossSentenceEvaluatorInputExamplemodel.fit⚡ Deprecations
- `InputExample` class for training is deprecated.
- `model.fit` method on `CrossEncoder` is deprecated.
- Legacy training arguments without using `CrossEncoderTrainingArguments` are deprecated.