articleHuggingFace Blog

IBM and UC Berkeley Diagnose Why Enterprise Agents Fail Using IT-Bench and MAST

IBM and UC Berkeley use ITBench and MAST to diagnose why enterprise agentic LLMs fail in IT automation (incident triage, logs/metrics, Kubernetes). They show that benchmark success rates hide root causes, with frontier models failing in isolated verification bottlenecks and large open models cascading via early reasoning mismatches. The work proposes concrete agent design patterns to harden reliability.

publié 18 FÉVR. 2026★★★★★

Lire la sourcehuggingface.co/blog/ibm-research/itbenchandmast

[*] Ouvre dans un nouvel onglet · pas de tracking côté Lantern

Source: HuggingFace Blog
Ingéré: 18 FÉVR. 2026 · 19:10
Score édito: 5.0 / 5