b9200
📦 llama-cppView on GitHub →
✨ 1 features🐛 1 fixes🔧 2 symbols
Summary
This release optimizes prompt decoding performance in MTP for llama models and includes a fix for llama-graph. It also provides numerous pre-compiled binaries for various operating systems and hardware configurations.
✨ New Features
- Improved performance in MTP (Multi-Threaded Prompt decoding) for llama models by avoiding unnecessary copying of logits during prompt decoding.
🐛 Bug Fixes
- Fixed an issue in llama-graph by ensuring set_output is called for t_h_pre_norm.