b8946
📦 llama-cppView on GitHub →
✨ 1 features🔧 1 symbols
Summary
This release cleans up the attention mechanism implementation by removing a redundant scaling factor for Qwen3 and LLaMA models. It also provides extensive pre-built binaries for various operating systems and hardware configurations.
✨ New Features
- Removed duplicate weight-of-scale (wo_s) scaling factor after attention block construction for Qwen3 and LLaMA models.