b8929
📦 llama-cppView on GitHub →
🔧 1 symbols
Summary
The default quantization type in llama-quant has been updated from Q5_1 to the more robust Q8_0 for improved stability when using default parameters. Numerous pre-compiled binaries for different operating systems and hardware configurations are also provided.
Migration Steps
- If relying on default quantization parameters in `llama_model_quantize_params`, be aware that the default `ftype` has changed from `LLAMA_FTYPE_MOSTLY_Q5_1` to `LLAMA_FTYPE_MOSTLY_Q8_0`.