v0.22.2

📅 Dec 2, 2025📦 tokenizersView on GitHub →

✨ 2 features🐛 3 fixes🔧 1 symbols

Summary

This release focuses on performance improvements, achieving 4x to 8x faster vocab loading with many added tokens due to GIL-free operations, alongside general typing and bug fixes.

Migration Steps

If you rely on specific internal behaviors related to token deserialization or normalization, review the changes introduced in PR #1891 and #1884.

✨ New Features

Improved typing support.
Significantly faster vocabulary loading (4x to 8x faster) when dealing with many added tokens, now utilizing GIL-free operations.

🐛 Bug Fixes

Fixed deserialization of added tokens.
Ensured `normalize_str` is used within `BaseTokenizer.normalize`.
Removed runtime stderr warning from Python bindings.

🔧 Affected Symbols

BaseTokenizer.normalize