Tokenizers
AI & LLMs💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
Release History
v0.22.23 fixes2 featuresThis release focuses on performance improvements, achieving 4x to 8x faster vocab loading with many added tokens due to GIL-free operations, alongside general typing and bug fixes.
v0.22.1Release v0.22.1 primarily involves bumping the upper version constraint for huggingface_hub and includes several documentation updates and trainer signature improvements.
v0.22.03 fixes3 featuresThis release introduces native async bindings and adds `from_bytes`/`read_bytes` methods to WordPiece Tokenizer for WebAssembly compatibility. It also includes several bug fixes and dependency updates.
v0.21.4This release (v0.21.4) is a re-release of v0.21.3 because the initial v0.21.3 release failed.
v0.21.3Breaking1 fixThis patch release addresses Clippy fixes and resolves an introduced backward breaking change in the Rust APIs.
v0.21.2Breaking7 fixes3 featuresThis release focuses on performance optimizations, enabling broader Python no GIL support, and fixing several issues related to onig compilation and training logic.
v0.21.1Breaking4 fixes2 featuresThis release upgrades the underlying Rust bindings to PyO3 0.23, drops support for older Python versions (3.7/3.8), and includes several bug fixes related to streaming and string normalization.
v0.21.1rc0Breaking5 fixes2 featuresThis release upgrades the underlying Rust bindings to PyO3 0.23 and drops support for older Python versions (3.7, 3.8). It also introduces feature updates like support for updating template processors and several bug fixes.