b8106
π¦ llama-cppView on GitHub β
β¨ 2 featuresπ 5 fixesπ§ 2 symbols
Summary
This release introduces full support for the JAIS-2 model architecture, including specific fixes for tokenizer hashing, RoPE type, and control vector support. It also notes that JAIS-2 requires F32 precision accumulators on CUDA.
Migration Steps
- No longer necessary to override set_vocab.
β¨ New Features
- Add support for the JAIS-2 family of Arabic-English bilingual models from Inception AI, featuring LayerNorm (no RMSNorm), ReLUΒ² activation, separate Q/K/V projections with biases, simple MLP, RoPE embeddings, and GPT-2 BPE tokenizer.
- Support for JAIS-2 model sizes: Jais-2-8B and Jais-2-70B.
π Bug Fixes
- Run convert_hf_to_gguf_update.py for jais-2 tokenizer hash.
- Use NEOX RoPE type for JAIS2.
- Remove Q/K permutation as NEOX RoPE does not require it.
- Enable flash attention for JAIS2 (fixed by #19115).
- Add dedicated JAIS2 pre-tokenizer type and control vector support, including LLAMA_VOCAB_PRE_TYPE_JAIS2 with cascading whitespace regex and build_cvec call.