Change8

v0.31.0

Breaking Changes
📦 huggingface-hubView on GitHub →
2 breaking9 features🐛 8 fixes🔧 6 symbols

Summary

This release introduces major enhancements to Inference Providers, adding support for LoRA inference via fal.ai and Replicate, and enabling 'auto' provider selection as the new default. Additionally, Xet uploads now support byte arrays, and large file downloads (>50GB) are more reliable.

⚠️ Breaking Changes

  • The default value of the 'provider' argument in InferenceClient and AsyncInferenceClient is now "auto" instead of "hf-inference" (HF Inference API). If your code relied on the previous default, you must explicitly set provider="hf-inference" or use provider="auto" if you rely on provider ordering.
  • HF Inference API Routing Update: The inference URL path for 'feature-extraction' and 'sentence-similarity' tasks has changed from https://router.huggingface.co/hf-inference/pipeline/{task}/{model} to https://router.huggingface.co/hf-inference/models/{model}/pipeline/{task}.

Migration Steps

  1. If your code relied on the default provider being "hf-inference", update your InferenceClient or AsyncInferenceClient initialization to explicitly set provider="hf-inference" or rely on the new default provider="auto" if you have configured provider preferences.
  2. If you are using HF Inference API for 'feature-extraction' or 'sentence-similarity' tasks, update any hardcoded routing logic to use the new path structure: https://router.huggingface.co/hf-inference/models/{model}/pipeline/{task}.

✨ New Features

  • Introduced support for LoRA inference via fal.ai and Replicate inference providers.
  • Enabled 'auto' mode for provider selection in InferenceClient, which selects the first available provider based on user settings.
  • Added support for feature extraction (embeddings) inference with the Sambanova provider.
  • HF Inference API provider now only supports a predefined list of deployed models; cold-starting arbitrary models is no longer supported.
  • Xet now supports uploading byte arrays via upload_file.
  • Added documentation for environment variables used by hf-xet to optimize download/upload performance (e.g., HF_XET_CHUNK_CACHE_SIZE_BYTES, HF_XET_NUM_CONCURRENT_RANGE_GETS).
  • Added HTTP download support for files larger than 50GB via the HF API.
  • Implemented dynamic batching for upload_large_folder, replacing the fixed 50-files-per-commit rule with an adaptive strategy.
  • Added support for new arguments when creating or updating Hugging Face Inference Endpoints (route payload and 'env' parameter).

🐛 Bug Fixes

  • Fixed an issue where 'sentence-transformers/all-MiniLM-L6-v2' didn't support the 'feature-extraction' task.
  • Fixed text generation issues.
  • Fixed HfInference conversational handling.
  • Fixed 'sentence_similarity' functionality on InferenceClient.
  • Fixed inference issues with URL endpoints.
  • Fixed an issue where the default CACHE_DIR was incorrect.
  • Retried on transient errors in the download workflow.
  • Fixed snapshot download behavior in offline mode when downloading to a local directory.

Affected Symbols