Change8

Tokenizers

AI & LLMs

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Latest: v0.22.28 releases4 breaking changesView on GitHub →

Release History

v0.22.23 fixes2 features
Dec 2, 2025

This release focuses on performance improvements, achieving 4x to 8x faster vocab loading with many added tokens due to GIL-free operations, alongside general typing and bug fixes.

v0.22.1
Sep 19, 2025

Release v0.22.1 primarily involves bumping the upper version constraint for huggingface_hub and includes several documentation updates and trainer signature improvements.

v0.22.03 fixes3 features
Aug 29, 2025

This release introduces native async bindings and adds `from_bytes`/`read_bytes` methods to WordPiece Tokenizer for WebAssembly compatibility. It also includes several bug fixes and dependency updates.

v0.21.4
Jul 28, 2025

This release (v0.21.4) is a re-release of v0.21.3 because the initial v0.21.3 release failed.

v0.21.3Breaking1 fix
Jul 4, 2025

This patch release addresses Clippy fixes and resolves an introduced backward breaking change in the Rust APIs.

v0.21.2Breaking7 fixes3 features
Jun 24, 2025

This release focuses on performance optimizations, enabling broader Python no GIL support, and fixing several issues related to onig compilation and training logic.

v0.21.1Breaking4 fixes2 features
Mar 13, 2025

This release upgrades the underlying Rust bindings to PyO3 0.23, drops support for older Python versions (3.7/3.8), and includes several bug fixes related to streaming and string normalization.

v0.21.1rc0Breaking5 fixes2 features
Mar 12, 2025

This release upgrades the underlying Rust bindings to PyO3 0.23 and drops support for older Python versions (3.7, 3.8). It also introduces feature updates like support for updating template processors and several bug fixes.