v0.1.35-beta

Breaking Changes

📅 Apr 2, 2026📦 unslothView on GitHub →

⚠ 1 breaking✨ 11 features🐛 16 fixes🔧 6 symbols

Summary

This release introduces full support for the new Gemma 4 model family, including low-RAM variants, alongside substantial improvements to tool calling accuracy, stability, and web search functionality. Numerous fixes were also applied across the Unsloth Studio environment and underlying build processes, particularly concerning llama.cpp integration and dependency pinning.

⚠️ Breaking Changes

Pinning transformers requirement for Gemma-4 training/inference might break compatibility if users rely on specific older/newer versions outside the pinned range (e.g., transformers==4.57.6 or 5.5.0 stable). Users should verify their existing workflows against the new transformer pins.

Migration Steps

If encountering issues with tool calling stability or accuracy, ensure you are using the latest Unsloth build, as significant improvements were made to tool call logic.
If using custom llama.cpp builds or targeting macOS metal, verify successful compilation based on recent fixes.
Users relying on specific older/newer transformers versions should review the pins set for Gemma 4 support (e.g., transformers==4.57.6 or 5.5.0 stable).

✨ New Features

Support for running and training the new Gemma 4 models (E2B, E4B, 26B-A4B, 31B) in Unsloth.
Multimodal reasoning Gemma 4 models are now licensed under Apache 2.0.
E2B and E4B models can run on 6GB RAM and on phones.
26B-A4B and 31B models can run on approximately 18GB RAM.
Tool call accuracy for all models increased by +30% to +80%.
Web search now retrieves actual web content instead of just summaries.
Increased maximum number of tool calls allowed from 10 to 25.
Improved tool call termination logic to reduce looping/repetitions.
Added more tool call healing and de-duplication logic to prevent XML leakage in tool calls.
Architecture-aware KV cache VRAM estimation (5-path) implemented.
Ability to display images from Python tool execution in the Unsloth Studio chat UI.

🐛 Bug Fixes

Tool calls for smaller models are now more stable and do not cut off.
Context length is now properly applied across models.
Fixed XML leaks in responses (reduced from 10/10 to 0/10 in tests).
Fixed SSL failures and empty page content issues in Studio web search.
Added tokenizers to no-torch dependencies and set TORCH_CONSTRAINT for arm64 macOS py313+.
Fixed Studio model selector styling for OOM models.
Fixed crash when loading local GGUF models on Windows.
Fixed save_pretrained_merged for fully finetuned models.
Fixed custom llama.cpp source builds and macOS metal source builds.
Fixed shell injection vulnerability during GGML export conversion.
Fixed Studio issue where small models would stall on tool-calling tasks.
Fixed Studio chat font changes leaking outside the chat page.
Fixed incorrect loading text for cached models during inference.
Fixed Windows issue compiling llama.cpp from source.
Fixed Studio issue where curated defaults were not prioritized in the Recommended model list.
Fixed Studio issue suppressing fatal errors when ggml-org has no prebuilt manifest.

Affected Symbols

Gemma 4 models (E2B, E4B, 26B-A4B, 31B)Tool calling logic Web search module save_pretrained_merged llama.cpp build process transformers library integration