Change8

v5.1.1

Breaking Changes
📦 sentence-transformersView on GitHub →
1 breaking3 features🐛 8 fixes🔧 8 symbols

Summary

Version 5.1.1 adds explicit validation of unused kwargs in `encode`, introduces FLOPS metrics for SparseEncoder evaluators, supports Knowledgeable Passage Retriever models, and includes several bug fixes around batch size handling, multi‑GPU processing, and evaluator output paths.

⚠️ Breaking Changes

  • SentenceTransformer.encode now raises a ValueError when called with keyword arguments that are not supported by the underlying model (e.g., using `normalize` instead of `normalize_embeddings`). This is a breaking change because previously such arguments were silently ignored. Fix by removing unsupported kwargs or using the correct parameter names; you can call `SentenceTransformer.get_model_kwargs()` to see which extra arguments are allowed.

Migration Steps

  1. Search your code for calls to `model.encode` that pass unsupported keyword arguments (e.g., `normalize`). Remove them or replace with the correct arguments such as `normalize_embeddings`.
  2. If you need to know which extra kwargs a model accepts, call `model.get_model_kwargs()` and adjust your calls accordingly.
  3. Review any usage of `CrossEncoderRerankingEvaluator` and ensure you are passing the `batch_size` argument if you rely on it.
  4. No code changes are required for the multi‑GPU fix, but be aware that embeddings are now moved to CPU before concatenation.
  5. If you rely on custom output directories for evaluators, you can now omit pre‑creating them; the library will create them automatically.

✨ New Features

  • Added `get_model_kwargs` method to expose model‑specific extra keyword arguments and to enforce validation of kwargs in `encode`.
  • Added FLOPS calculation and updated metrics in SparseEncoder evaluators.
  • Added support for Knowledgeable Passage Retriever (KPR) models.

🐛 Bug Fixes

  • Fixed `batch_size` being ignored in `CrossEncoderRerankingEvaluator`.
  • Fixed multi‑GPU processing in `encode` by moving embeddings from all devices to CPU before stacking.
  • Updated `mine_hard_negatives` to use `encode_query` and `encode_document` automatically based on defined prompts.
  • Fixed "Path does not exist" errors when an Evaluator is called with an `output_path` that does not yet exist (now creates missing directories).
  • Corrected the reported number of missing negatives in `mine_hard_negatives`.
  • Ensured that `input_ids`, `attention_mask`, `token_type_ids`, and `inputs_embeds` are always passed to the model's `forward` method.
  • Explicitly imported `SentenceTransformer` class in the losses module to avoid import errors.
  • Added `makedirs` handling in `InformationRetrievalEvaluator` for output paths.

🔧 Affected Symbols

SentenceTransformer.encodeSentenceTransformer.get_model_kwargsCrossEncoderRerankingEvaluatormine_hard_negativesSparseEncoder evaluators (FLOPS calculation)InformationRetrievalEvaluatorlosses module import of SentenceTransformerforward method input handling