max/v25.2.0
📦 mojo-language
✨ 7 features
Summary
MAX 25.2 introduces significant enhancements for large-scale AI deployment, including comprehensive NVIDIA Hopper support, multi-GPU tensor parallelism, and new memory optimization features like GPTQ quantization.
✨ New Features
- Comprehensive NVIDIA Hopper support with high-performance kernels.
- Multi-GPU tensor parallelism support for large models (e.g., Llama-3.3-70B).
- Expanded model support including Phi3, Olmo, and Granite.
- Introduction of GPTQ quantization for improved memory efficiency.
- Advanced long context optimizations: in-flight batching, chunked prefill, and copy-on-write.
- Improved kernel caching resulting in compilation time reductions up to 28%.
- New Mojo GPU APIs providing developers with greater control and performance.