v0.1.25-beta

📅 Mar 27, 2026📦 unslothView on GitHub →

✨ 8 features🐛 21 fixes🔧 8 symbols

Summary

This release delivers significant performance improvements, making inference 20-30% faster, and introduces auto-detection for models from sources like LM Studio and Hugging Face. Numerous bug fixes address stability, installation logging, and Studio UI responsiveness.

Migration Steps

If you experience issues closing Unsloth Studio from a previous session, you can restart your computer or manually kill the process using `lsof -i :8888` then `kill -9 <PID>`.

✨ New Features

Inference speed is now 20–30% faster due to optimizations for tool-calling and repeat penalty.
Inference token/s speed calculation now correctly excludes startup time, reflecting 'true' inference speed.
Now auto-detects older or pre-existing models downloaded from LM Studio, Hugging Face, and similar sources.
Improved tool-calling and websearch reliability with reduced errors.
Cleaner, smarter install and setup logging across Windows and Linux, now quieter by default with optional `--verbose` diagnostics.
Unsloth Studio now has a proper shutdown 'x' button; closing the associated terminal fully exits the application.
Studio shortcuts now launch in a visible terminal.
Detects always-on reasoning models and shows the Think button as locked-on.

🐛 Bug Fixes

Fixed CPU usage spikes caused by continuous resubscription in `useLiveQuery`.
Fixed repetition_penalty default causing a ~24% TPS drop in GGUF inference.
Skipped `flex_attention` for models with non-zero `attention_dropout`.
Fixed Colab Studio launch and `setup.ps1` box alignment.
Fixed Colab setup skipping llama.cpp installation.
Fixed Colab huggingface-hub conflict and ensured `ensurepip` fallback.
Fixed Windows installer failing on `_yaml.pyd` Access Denied errors.
Fixed `install.sh` Mac Intel compatibility and Studio no-torch support.
Fixed installation dependencies for no-torch mode by using `unsloth[huggingfacenotorch]` instead of `--no-deps`.
Fixed Gemma3N audio training stride assertion with non-reentrant checkpointing.
Fixed missing `num_items_in_batch` in `unsloth_prediction_step`.
Fixed ~1.2s TTFT penalty when tools are enabled in Studio.
Fixed GGUF GPU fit check to correctly account for KV cache VRAM.
Fixed orphan server cleanup killing user's own llama-server instances.
Fixed inference failing for transformers 5.x models due to `trust_remote_code` issues.
Fixed no-torch install dependencies by pulling them from a requirements file.
Fixed no-torch install dependencies to prevent transitively pulling torch.
Fixed streaming tool detection by guarding late `tool_calls` and filtering incomplete fragments.
Fixed HF cache default and ensured LM Studio models show in chat/inference.
Fixed disabling OCR in `pymupdf4llm` PDF extraction.
Studio UI fixes: aligned Dataset/Parameters/Training cards, fixed expandable height, animated LoRA settings, humanized ETA display, and replaced navbar shutdown text with icon-only button.

Affected Symbols

useLiveQuery flex_attention attention_dropout setup.ps1 install.sh unsloth_prediction_step trust_remote_code pymupdf4llm