Change8

v0.1.38-beta

📦 unsloth
10 features🐛 23 fixes🔧 2 symbols

Summary

This release introduces powerful local LLM API serving via `llama-server` with features like self-healing tool calling, code execution, and web search. Numerous stability and usability improvements were made across the Unsloth Studio interface and training pipeline.

Migration Steps

  1. When running locally, Studio now defaults the host to 127.0.0.1 and prompts before auto-starting.
  2. Use `unsloth studio run --local` or pass model arguments like `model:quant` to `unsloth studio run` to load models via the server.
  3. Use `--enable-tools`/`--disable-tools` server-side flags with `unsloth run` to control tool policy.

✨ New Features

  • Support for connecting local LLMs (like Qwen and Gemma) to Unsloth's API endpoint for local inference.
  • Introduction of self-healing tool calling, reducing broken/malformed tool calls by 50%.
  • Code execution support (Bash and Python) for more accurate code outputs.
  • Advanced web search capability that visits and reads webpages for in-depth information.
  • Automatic inference settings tuning for GGUF models (temp, top-k, etc.).
  • Local models exposed as an authenticated API via `llama-server`.
  • API supports Anthropic-compatible `/v1/messages` dialect.
  • API supports OpenAI-compatible `/v1/chat/completions` and `/v1/responses` dialects.
  • Both API dialects support streaming, tool calling, and vision inputs.
  • Added support for NVIDIA Nemotron 3 Nano Omni, IBM Granite 4.1, and Mistral 3.5 Medium models.

🐛 Bug Fixes

  • Stopped Studio training runs can now resume from checkpoints.
  • Chat threads now autosave and persist more reliably.
  • Fixed DPO training hangs in multi-process setups.
  • Improved VLM GRPO support with MROPE updates.
  • Studio's stop button now properly stops generation.
  • Fixed chat template disappearing after browser refresh.
  • Studio now uses (gguf) context length before max seq length.
  • Fixed typo cleanup across tests and backend strings.
  • Guarded resolve_model_class fallback against unresolvable transformers AutoModel entries.
  • Studio now kills in-flight llama-server before spawning a new one.
  • Studio fixed currency escape from breaking inline LaTeX.
  • Studio now probes AMD GPUs in llama-server VRAM detection.
  • Fixed mmproj F16 variant selection using endswith.
  • Fixed Windows installation when paths contain spaces or Python 3.14 is on PATH.
  • Studio preserved transparency in uploaded profile avatars.
  • Fixed image-only chat requests failing validation in Studio.
  • Patched checkpoint reload init functions to strip unsupported arguments.
  • Fixed DPO trainer multi-process hang.
  • Fixed local model scanner to handle ollama cloud models.
  • Fixed Studio desktop tray installer and titlebar issues.
  • Fixed check for libcurl headers in install.sh.
  • Fixed FP8 weight shape check using % 8 instead of // 8.
  • Pinned Studio GGUF export to llama.cpp's local convert script.

Affected Symbols