Aligning to What? Rethinking Agent Generalization in MiniMax M2
MiniMax M2 targets both benchmark performance and real-world robustness, arguing that the agent alignment problem goes beyond scoring on isolated tests. It introduces Interleaved Thinking, enabling ongoing internal reasoning to cope with long-horizon tasks and continuous external perturbations. True generalization, the post contends, comes from adaptive perturbation handling and broader tool use, not mere tool-count scaling.