b8946

📅 Apr 27, 2026📦 llama-cppView on GitHub →

✨ 1 features🔧 1 symbols

Summary

This release cleans up the attention mechanism implementation by removing a redundant scaling factor for Qwen3 and LLaMA models. It also provides extensive pre-built binaries for various operating systems and hardware configurations.

✨ New Features

Removed duplicate weight-of-scale (wo_s) scaling factor after attention block construction for Qwen3 and LLaMA models.

Affected Symbols

build_attn