Change8

b9200

📦 llama-cppView on GitHub →
1 features🐛 1 fixes🔧 2 symbols

Summary

This release optimizes prompt decoding performance in MTP for llama models and includes a fix for llama-graph. It also provides numerous pre-compiled binaries for various operating systems and hardware configurations.

✨ New Features

  • Improved performance in MTP (Multi-Threaded Prompt decoding) for llama models by avoiding unnecessary copying of logits during prompt decoding.

🐛 Bug Fixes

  • Fixed an issue in llama-graph by ensuring set_output is called for t_h_pre_norm.

Affected Symbols