Change8

v0.1.39-beta

📦 unslothView on GitHub →
10 features🐛 24 fixes🔧 4 symbols

Summary

This release introduces powerful local LLM integration via a self-hosted API endpoint supporting advanced tooling, code execution, and web search. Numerous bug fixes were applied across the Studio UI, training stability (DPO hangs), and installation scripts.

Migration Steps

  1. Update using `2026.5.2` or directly call `curl -fsSL https://unsloth.ai/install.sh | sh` or `unsloth studio update` to resolve chat history/attachment bugs.
  2. Patch checkpoint reload init functions to strip unsupported arguments if resuming training.
  3. If using Studio, note that the default host is now 127.0.0.1 and it prompts before auto-start.
  4. Use `unsloth studio run --forward-args` or similar mechanisms to pass llama-server arguments if customizing server startup.
  5. Use `model_name:quantization_type` syntax when loading models via Studio run commands.

✨ New Features

  • Local LLMs (like Claude Code, Codex, Qwen, Gemma) can now be run via Unsloth's API endpoint, enabling features like self-healing tool calling, code execution (Bash/Python), and advanced web search.
  • Unsloth API endpoint exposes models via `llama-server` speaking Anthropic-compatible `/v1/messages` and OpenAI-compatible `/v1/chat/completions` and `/v1/responses` dialects.
  • Added support for new models: NVIDIA Nemotron 3 Nano Omni, IBM Granite 4.1, and Mistral 3.5 Medium.
  • Added support for Qwen3.6.
  • Studio: Added dataset upload dropzone.
  • Studio: Enabled deleting fine-tuned chat models.
  • Studio: Added checkpoint resume functionality for stopped training runs.
  • Studio: Default host set to 127.0.0.1 and prompts before auto-start.
  • Studio: Added ability to forward llama-server arguments from `unsloth studio run` and allow passing model:quant to load models.
  • unsloth run: Added --enable-tools/--disable-tools server-side tool policy.

🐛 Bug Fixes

  • Fixed chat history not being shown (existing history is preserved).
  • Fixed attachments not attaching correctly (render-only bug).
  • Stopped Studio training runs can now resume from checkpoints.
  • Chat threads now autosave and persist more reliably.
  • Fixed DPO training hangs in multi-process setups.
  • Improved VLM GRPO support with MROPE updates.
  • Studio's stop button now properly stops generation.
  • Fixed chat template disappearing after browser refresh.
  • Fixed issue where Studio used max seq length instead of (gguf) context length.
  • Fixed typo cleanup across tests and backend strings.
  • Guarded resolve_model_class fallback against unresolvable transformers AutoModel entries.
  • Studio: Kills in-flight llama-server before spawning a new one.
  • Studio: Stopped currency escape from breaking inline LaTeX.
  • Studio: Probed AMD GPUs in llama-server VRAM detection.
  • Fixed issue where mmproj F16 variant selection used incorrect logic.
  • Fixed Windows install issues when paths contain spaces or Python 3.14 is on PATH.
  • Studio: Preserved transparency in uploaded profile avatars.
  • Fixed UX issue with single chat header error placement and selector alignment.
  • Studio: Fixed clipped model selector text descenders.
  • Fixed DPO trainer multi-process hang.
  • Fixed local model scanner handling of ollama cloud models.
  • Fixed Studio desktop tray installer and titlebar issues.
  • Fixed check for libcurl headers in install.sh.
  • Fixed image-only chat requests failing validation.

Affected Symbols