v0.22.2
📦 tokenizersView on GitHub →
✨ 2 features🐛 3 fixes🔧 1 symbols
Summary
This release focuses on performance improvements, achieving 4x to 8x faster vocab loading with many added tokens due to GIL-free operations, alongside general typing and bug fixes.
Migration Steps
- If you rely on specific internal behaviors related to token deserialization or normalization, review the changes introduced in PR #1891 and #1884.
✨ New Features
- Improved typing support.
- Significantly faster vocabulary loading (4x to 8x faster) when dealing with many added tokens, now utilizing GIL-free operations.
🐛 Bug Fixes
- Fixed deserialization of added tokens.
- Ensured `normalize_str` is used within `BaseTokenizer.normalize`.
- Removed runtime stderr warning from Python bindings.
🔧 Affected Symbols
BaseTokenizer.normalize