●Change8

b8106

πŸ“¦ llama-cppView on GitHub β†’
✨ 2 featuresπŸ› 5 fixesπŸ”§ 2 symbols

Summary

This release introduces full support for the JAIS-2 model architecture, including specific fixes for tokenizer hashing, RoPE type, and control vector support. It also notes that JAIS-2 requires F32 precision accumulators on CUDA.

Migration Steps

  1. No longer necessary to override set_vocab.

✨ New Features

  • Add support for the JAIS-2 family of Arabic-English bilingual models from Inception AI, featuring LayerNorm (no RMSNorm), ReLUΒ² activation, separate Q/K/V projections with biases, simple MLP, RoPE embeddings, and GPT-2 BPE tokenizer.
  • Support for JAIS-2 model sizes: Jais-2-8B and Jais-2-70B.

πŸ› Bug Fixes

  • Run convert_hf_to_gguf_update.py for jais-2 tokenizer hash.
  • Use NEOX RoPE type for JAIS2.
  • Remove Q/K permutation as NEOX RoPE does not require it.
  • Enable flash attention for JAIS2 (fixed by #19115).
  • Add dedicated JAIS2 pre-tokenizer type and control vector support, including LLAMA_VOCAB_PRE_TYPE_JAIS2 with cascading whitespace regex and build_cvec call.

Affected Symbols