v0.1.25-beta
📦 unslothView on GitHub →
✨ 8 features🐛 21 fixes🔧 8 symbols
Summary
This release delivers significant performance improvements, making inference 20-30% faster, and introduces auto-detection for models from sources like LM Studio and Hugging Face. Numerous bug fixes address stability, installation logging, and Studio UI responsiveness.
Migration Steps
- If you experience issues closing Unsloth Studio from a previous session, you can restart your computer or manually kill the process using `lsof -i :8888` then `kill -9 <PID>`.
✨ New Features
- Inference speed is now 20–30% faster due to optimizations for tool-calling and repeat penalty.
- Inference token/s speed calculation now correctly excludes startup time, reflecting 'true' inference speed.
- Now auto-detects older or pre-existing models downloaded from LM Studio, Hugging Face, and similar sources.
- Improved tool-calling and websearch reliability with reduced errors.
- Cleaner, smarter install and setup logging across Windows and Linux, now quieter by default with optional `--verbose` diagnostics.
- Unsloth Studio now has a proper shutdown 'x' button; closing the associated terminal fully exits the application.
- Studio shortcuts now launch in a visible terminal.
- Detects always-on reasoning models and shows the Think button as locked-on.
🐛 Bug Fixes
- Fixed CPU usage spikes caused by continuous resubscription in `useLiveQuery`.
- Fixed repetition_penalty default causing a ~24% TPS drop in GGUF inference.
- Skipped `flex_attention` for models with non-zero `attention_dropout`.
- Fixed Colab Studio launch and `setup.ps1` box alignment.
- Fixed Colab setup skipping llama.cpp installation.
- Fixed Colab huggingface-hub conflict and ensured `ensurepip` fallback.
- Fixed Windows installer failing on `_yaml.pyd` Access Denied errors.
- Fixed `install.sh` Mac Intel compatibility and Studio no-torch support.
- Fixed installation dependencies for no-torch mode by using `unsloth[huggingfacenotorch]` instead of `--no-deps`.
- Fixed Gemma3N audio training stride assertion with non-reentrant checkpointing.
- Fixed missing `num_items_in_batch` in `unsloth_prediction_step`.
- Fixed ~1.2s TTFT penalty when tools are enabled in Studio.
- Fixed GGUF GPU fit check to correctly account for KV cache VRAM.
- Fixed orphan server cleanup killing user's own llama-server instances.
- Fixed inference failing for transformers 5.x models due to `trust_remote_code` issues.
- Fixed no-torch install dependencies by pulling them from a requirements file.
- Fixed no-torch install dependencies to prevent transitively pulling torch.
- Fixed streaming tool detection by guarding late `tool_calls` and filtering incomplete fragments.
- Fixed HF cache default and ensured LM Studio models show in chat/inference.
- Fixed disabling OCR in `pymupdf4llm` PDF extraction.
- Studio UI fixes: aligned Dataset/Parameters/Training cards, fixed expandable height, animated LoRA settings, humanized ETA display, and replaced navbar shutdown text with icon-only button.