Change8

b8946

📦 llama-cppView on GitHub →
1 features🔧 1 symbols

Summary

This release cleans up the attention mechanism implementation by removing a redundant scaling factor for Qwen3 and LLaMA models. It also provides extensive pre-built binaries for various operating systems and hardware configurations.

✨ New Features

  • Removed duplicate weight-of-scale (wo_s) scaling factor after attention block construction for Qwen3 and LLaMA models.

Affected Symbols