v5.2.0
Breaking Changes📦 sentence-transformersView on GitHub →
⚠ 1 breaking✨ 6 features🐛 2 fixes⚡ 2 deprecations🔧 8 symbols
Summary
Version 5.2.0 adds multiprocessing to CrossEncoder, multilingual NanoBEIR support, similarity scores in hard‑negative mining, and updates for Transformers 5 while deprecating Python 3.9 and the old `n-tuple-scores` format.
⚠️ Breaking Changes
- The old `n-tuple-scores` output format for `mine_hard_negatives` has been removed; use `output_format=\"n-tuple\"` with `output_scores=True` instead.
Migration Steps
- If you used the old `n-tuple-scores` format, replace it with `output_format=\"n-tuple\"` and set `output_scores=True`.
- Upgrade the `transformers` package to >=5.0 to ensure compatibility.
- When using `CrossEncoder`, initialize a pool with `pool = model.start_multi_process_pool()` and pass `pool=pool` to `predict`/`rank`, then call `model.stop_multi_process_pool(pool)` after inference. Alternatively, pass a list of devices (e.g., `device=[\"cpu\"]*4`).
✨ New Features
- CrossEncoder now supports multiprocessing via `start_multi_process_pool`, `stop_multi_process_pool`, and the `pool` argument to `predict` and `rank`.
- Providing a list of devices to `CrossEncoder` automatically creates a multiprocessing pool for faster CPU or multi‑GPU inference.
- NanoBEIR evaluators accept a `dataset_id` parameter, enabling evaluation on multilingual NanoBEIR collections.
- `mine_hard_negatives` adds an `output_scores` parameter to export similarity scores alongside mined negatives.
- Support for Transformers library version 5.x.
- Improved handling of datasets with multiple positive passages in hard‑negative mining.
🐛 Bug Fixes
- Fixed several issues when datasets contain multiple positive passages during hard‑negative mining.
- Resolved bugs related to multi‑GPU usage in `mine_hard_negatives`.
🔧 Affected Symbols
CrossEncoderCrossEncoder.start_multi_process_poolCrossEncoder.stop_multi_process_poolCrossEncoder.predictCrossEncoder.rankmine_hard_negativesNanoBEIREvaluatorSentenceTransformer⚡ Deprecations
- Python 3.9 support is deprecated; upgrade to Python 3.10 or newer.
- The `n-tuple-scores` format in `mine_hard_negatives` is deprecated and replaced by the new `output_scores` handling.