Change8

b9271

📦 llama-cppView on GitHub →
🐛 1 fixes🔧 1 symbols

Summary

This release optimizes performance by skipping redundant logit computations during draft model follow-up decoding. It also provides numerous pre-compiled binaries for various operating systems and hardware configurations.

🐛 Bug Fixes

  • Fixed unnecessary logit computation during follow-up decode for the draft model by utilizing inp_out_ids to skip the computation.

Affected Symbols