Change8

v4.0.1

Breaking Changes
📦 sentence-transformers
3 breaking17 features3 deprecations🔧 8 symbols

Summary

Version 4.0.1 introduces a complete overhaul of the CrossEncoder training pipeline with a new `CrossEncoderTrainer`, dataset‑based inputs, multi‑GPU and bf16 support, and many training‑related enhancements, while keeping inference unchanged.

⚠️ Breaking Changes

  • The old training workflow using `InputExample`, `DataLoader` and `model.fit()` has been removed. Switch to the new `CrossEncoderTrainer` with a `datasets.Dataset` or `DatasetDict` and a `CrossEncoderTrainingArguments` instance.
  • The `loss` argument now expects a loss object or a dictionary of loss objects keyed by dataset names; passing a single loss for a `DatasetDict` without a matching dict will raise an error.
  • Automatic model card generation now overwrites previous manual cards; if you relied on custom cards you must disable or edit them after training.

Migration Steps

  1. Replace any usage of `model.fit(...)` with a `CrossEncoderTrainer` instance and call `trainer.train()`.
  2. Convert training data from lists of `InputExample` or `DataLoader` objects to a HuggingFace `datasets.Dataset` (or `DatasetDict`).
  3. If you train on multiple datasets, provide a dictionary of loss objects keyed by the dataset names and pass a matching dictionary to the `loss` parameter.
  4. Import and use `CrossEncoderTrainingArguments` (or the base `TrainingArguments`) instead of the old argument objects.
  5. Update imports to include `CrossEncoderTrainer` and, if needed, `CrossEncoderTrainingArguments` from `sentence_transformers`.
  6. Install the `[train]` extra (`pip install sentence-transformers[train]==4.0.1`) to get the new training dependencies.
  7. If you rely on custom model cards, disable the automatic generation or edit the generated card after training.

✨ New Features

  • Multi‑GPU training support with both Data Parallelism (DP) and Distributed Data Parallelism (DDP).
  • Native bf16 training support.
  • Loss logging during training.
  • Evaluation datasets and evaluation loss integration.
  • Enhanced callback system with built‑in support for Weights & Biases, TensorBoard, CodeCarbon and custom callbacks.
  • Gradient checkpointing to reduce memory usage.
  • Gradient accumulation support.
  • Improved automatic model‑card generation for trained models.
  • Configurable warmup ratio in training arguments.
  • Automatic push of model checkpoints to the Hugging Face Hub.
  • Resume training from a saved checkpoint.
  • Hyperparameter optimization integration.
  • New `CrossEncoderTrainer` class built on 🤗 Transformers `Trainer`.
  • New `CrossEncoderTrainingArguments` subclass for detailed training configuration.
  • Ability to train on `datasets.Dataset` or `datasets.DatasetDict` instead of lists of `InputExample`.
  • Support for providing a dictionary of loss functions when training with multiple datasets.
  • Optional `SentenceEvaluator` can be used alongside evaluation loss.

🔧 Affected Symbols

CrossEncoderCrossEncoderTrainerCrossEncoderTrainingArgumentsBinaryCrossEntropyLossCachedMultipleNegativesRankingLossSentenceEvaluatorInputExamplemodel.fit

⚡ Deprecations

  • `InputExample` class for training is deprecated.
  • `model.fit` method on `CrossEncoder` is deprecated.
  • Legacy training arguments without using `CrossEncoderTrainingArguments` are deprecated.