b9499

📅 Jun 4, 2026📦 llama-cppView on GitHub →

✨ 5 features🔧 2 symbols

Summary

This release focuses on internal refactoring within ggml-webgpu, specifically starting a FlashAttention refactor and standardizing quantization logic across relevant modules.

✨ New Features

Began refactoring for FlashAttention in ggml-webgpu.
Standardized quantization support across ggml-webgpu components.
Split k/v quantization logic.
Refactored and abstracted quantization logic for flash_attn and mul_mat.
Added quantization support to the tile path.

Affected Symbols

flash_attn mul_mat