v0.1.38-beta
📦 unsloth
✨ 10 features🐛 23 fixes🔧 2 symbols
Summary
This release introduces powerful local LLM API serving via `llama-server` with features like self-healing tool calling, code execution, and web search. Numerous stability and usability improvements were made across the Unsloth Studio interface and training pipeline.
Migration Steps
- When running locally, Studio now defaults the host to 127.0.0.1 and prompts before auto-starting.
- Use `unsloth studio run --local` or pass model arguments like `model:quant` to `unsloth studio run` to load models via the server.
- Use `--enable-tools`/`--disable-tools` server-side flags with `unsloth run` to control tool policy.
✨ New Features
- Support for connecting local LLMs (like Qwen and Gemma) to Unsloth's API endpoint for local inference.
- Introduction of self-healing tool calling, reducing broken/malformed tool calls by 50%.
- Code execution support (Bash and Python) for more accurate code outputs.
- Advanced web search capability that visits and reads webpages for in-depth information.
- Automatic inference settings tuning for GGUF models (temp, top-k, etc.).
- Local models exposed as an authenticated API via `llama-server`.
- API supports Anthropic-compatible `/v1/messages` dialect.
- API supports OpenAI-compatible `/v1/chat/completions` and `/v1/responses` dialects.
- Both API dialects support streaming, tool calling, and vision inputs.
- Added support for NVIDIA Nemotron 3 Nano Omni, IBM Granite 4.1, and Mistral 3.5 Medium models.
🐛 Bug Fixes
- Stopped Studio training runs can now resume from checkpoints.
- Chat threads now autosave and persist more reliably.
- Fixed DPO training hangs in multi-process setups.
- Improved VLM GRPO support with MROPE updates.
- Studio's stop button now properly stops generation.
- Fixed chat template disappearing after browser refresh.
- Studio now uses (gguf) context length before max seq length.
- Fixed typo cleanup across tests and backend strings.
- Guarded resolve_model_class fallback against unresolvable transformers AutoModel entries.
- Studio now kills in-flight llama-server before spawning a new one.
- Studio fixed currency escape from breaking inline LaTeX.
- Studio now probes AMD GPUs in llama-server VRAM detection.
- Fixed mmproj F16 variant selection using endswith.
- Fixed Windows installation when paths contain spaces or Python 3.14 is on PATH.
- Studio preserved transparency in uploaded profile avatars.
- Fixed image-only chat requests failing validation in Studio.
- Patched checkpoint reload init functions to strip unsupported arguments.
- Fixed DPO trainer multi-process hang.
- Fixed local model scanner to handle ollama cloud models.
- Fixed Studio desktop tray installer and titlebar issues.
- Fixed check for libcurl headers in install.sh.
- Fixed FP8 weight shape check using % 8 instead of // 8.
- Pinned Studio GGUF export to llama.cpp's local convert script.