v5.1.1
Breaking Changes📦 sentence-transformersView on GitHub →
⚠ 1 breaking✨ 3 features🐛 8 fixes🔧 8 symbols
Summary
Version 5.1.1 adds explicit validation of unused kwargs in `encode`, introduces FLOPS metrics for SparseEncoder evaluators, supports Knowledgeable Passage Retriever models, and includes several bug fixes around batch size handling, multi‑GPU processing, and evaluator output paths.
⚠️ Breaking Changes
- SentenceTransformer.encode now raises a ValueError when called with keyword arguments that are not supported by the underlying model (e.g., using `normalize` instead of `normalize_embeddings`). This is a breaking change because previously such arguments were silently ignored. Fix by removing unsupported kwargs or using the correct parameter names; you can call `SentenceTransformer.get_model_kwargs()` to see which extra arguments are allowed.
Migration Steps
- Search your code for calls to `model.encode` that pass unsupported keyword arguments (e.g., `normalize`). Remove them or replace with the correct arguments such as `normalize_embeddings`.
- If you need to know which extra kwargs a model accepts, call `model.get_model_kwargs()` and adjust your calls accordingly.
- Review any usage of `CrossEncoderRerankingEvaluator` and ensure you are passing the `batch_size` argument if you rely on it.
- No code changes are required for the multi‑GPU fix, but be aware that embeddings are now moved to CPU before concatenation.
- If you rely on custom output directories for evaluators, you can now omit pre‑creating them; the library will create them automatically.
✨ New Features
- Added `get_model_kwargs` method to expose model‑specific extra keyword arguments and to enforce validation of kwargs in `encode`.
- Added FLOPS calculation and updated metrics in SparseEncoder evaluators.
- Added support for Knowledgeable Passage Retriever (KPR) models.
🐛 Bug Fixes
- Fixed `batch_size` being ignored in `CrossEncoderRerankingEvaluator`.
- Fixed multi‑GPU processing in `encode` by moving embeddings from all devices to CPU before stacking.
- Updated `mine_hard_negatives` to use `encode_query` and `encode_document` automatically based on defined prompts.
- Fixed "Path does not exist" errors when an Evaluator is called with an `output_path` that does not yet exist (now creates missing directories).
- Corrected the reported number of missing negatives in `mine_hard_negatives`.
- Ensured that `input_ids`, `attention_mask`, `token_type_ids`, and `inputs_embeds` are always passed to the model's `forward` method.
- Explicitly imported `SentenceTransformer` class in the losses module to avoid import errors.
- Added `makedirs` handling in `InformationRetrievalEvaluator` for output paths.
🔧 Affected Symbols
SentenceTransformer.encodeSentenceTransformer.get_model_kwargsCrossEncoderRerankingEvaluatormine_hard_negativesSparseEncoder evaluators (FLOPS calculation)InformationRetrievalEvaluatorlosses module import of SentenceTransformerforward method input handling