b8873

📅 Apr 21, 2026📦 llama-cppView on GitHub →

✨ 5 features🐛 4 fixes🔧 2 symbols

Summary

This release introduces significant improvements to OpenVINO backend, including thread safety enhancements, NPU memory optimizations via weightless caching, and added support for Gelu tanh and Imrope. CI/CD pipelines for OpenVINO were also restructured.

Migration Steps

Use i4/i8 quantization directly for symmetric quantization cases in OpenVINO.

✨ New Features

Implemented thread safety guarantees per request.
Added support for Gelu tanh activation function.
Added support for Imrope.
Added WeightlessCacheAttribute to reduce NPU memory usage for OpenVINO.
Added GPU and NPU support to the OpenVINO Dockerfile.

🐛 Bug Fixes

Fixed ROPE yarn case.
Fixed sticky stateful configuration issues.
Fixed explicit ov::Tensor frees in ggml_backend_openvino_free.
Fixed thread-safety issues related to the shared runtime context.

Affected Symbols

ggml_backend_openvino_free WeightlessCacheAttribute