Change8

b8833

📦 llama-cppView on GitHub →
🐛 3 fixes🔧 3 symbols

Summary

This release primarily addresses compiler warnings, refactors FlashAttention encoding within ggml-webgpu, and includes fixes for soft_max precision and potential segfaults on the Vulkan backend.

Migration Steps

  1. Update workflows to remove dependence on llvmpipe.

🐛 Bug Fixes

  • Fixed soft_max calculation and updated reg_tile accumulation to f32 for improved precision.
  • Attempted to avoid segfaults on the Vulkan backend process exit.
  • Removed compiler warnings related to parameter casting.

Affected Symbols