Deploy efficient inference with speculative decoding for faster generation
MiMo includes Multiple-Token Prediction (MTP) layers enabling speculative decoding with ~90% acceptance rate, reducing latency by 2.29× during training and 1.96× during validation. Supported in vLLM, SGLang, and HuggingFace transformers.
