Change8

max/v25.2.0

📦 mojo-language
7 features

Summary

MAX 25.2 introduces significant enhancements for large-scale AI deployment, including comprehensive NVIDIA Hopper support, multi-GPU tensor parallelism, and new memory optimization features like GPTQ quantization.

✨ New Features

  • Comprehensive NVIDIA Hopper support with high-performance kernels.
  • Multi-GPU tensor parallelism support for large models (e.g., Llama-3.3-70B).
  • Expanded model support including Phi3, Olmo, and Granite.
  • Introduction of GPTQ quantization for improved memory efficiency.
  • Advanced long context optimizations: in-flight batching, chunked prefill, and copy-on-write.
  • Improved kernel caching resulting in compilation time reductions up to 28%.
  • New Mojo GPU APIs providing developers with greater control and performance.